On the timescale for galaxy formation

On the timescale for galaxy formation

I’ve been wanting to expand on the previous post ever since I wrote it, which is over a month ago now. It has been a busy end to the semester. Plus, there’s a lot to say – nothing that hasn’t been said before, somewhere, somehow, yet still a lot to cobble together into a coherent story – if that’s even possible. This will be a long post, and there will be more after to narrate the story of our big paper in the ApJ. My sole ambition here is to express the predictions of galaxy formation theory in LCDM and MOND in the broadest strokes.

A theory is only as good as its prior. We can always fudge things after the fact, so what matters most is what we predict in advance. What do we expect for the timescale of galaxy formation? To tell you what I’m going to tell you, it takes a long time to build a massive galaxy in LCDM, but it happens much faster in MOND.

Basic Considerations

What does it take to make a galaxy? A typical giant elliptical galaxy has a stellar mass of 9 x 1010 M. That’s a bit more than our own Milky Way, which has a stellar mass of 5 or 6 x 1010 M (depending who you ask) with another 1010 M or so in gas. So, in classic astronomy/cosmology style, let’s round off and say a big galaxy is about 1011 M. That’s a hundred billion stars, give or take.

An elliptical galaxy (NGC 3379, left) and two spiral galaxies (NGC 628 and NGC 891, right).

How much of the universe does it take to make one big galaxy? The critical density of the universe is the over/under point for whether an expanding universe expands forever, or has enough self-gravity to halt the expansion and ultimately recollapse. Numerically, this quantity is ρcrit = 3H02/(8πG), which for H0 = 73 km/s/Mpc works out to 10-29 g/cm3 or 1.5 x 10-7 M/pc3. This is a very small number, but provides the benchmark against which we measure densities in cosmology. The density of any substance X is ΩX = ρXcrit. The stars and gas in galaxies are made of baryons, and we know the baryon density pretty well from Big Bang Nucleosynthesis: Ωb = 0.04. That means the average density of normal matter is very low, only about 4 x 10-31 g/cm3. That’s less than one hydrogen atom per cubic meter – most of space is an excellent vacuum!

This being the case, we need to scoop up a large volume to make a big galaxy. Going through the math, to gather up enough mass to make a 1011 M galaxy, we need a sphere with a radius of 1.6 Mpc. That’s in today’s universe; in the past the universe was denser by (1+z)3, so at z = 10 that’s “only” 140 kpc. Still, modern galaxies are much smaller than that; the effective edge of the disk of the Milky Way is at a radius of about 20 kpc, and most of the baryonic mass is concentrated well inside that: the typical half-light radius of a 1011 M galaxy is around 6 kpc. That’s a long way to collapse.

Monolithic Galaxy Formation

Given this much information, an early concept was monolithic galaxy formation. We have a big ball of gas in the early universe that collapses to form a galaxy. Why and how this got started was fuzzy. But we knew how much mass we needed and the volume it had to come from, so we can consider what happens as the gas collapses to create a galaxy.

Here we hit a big astrophysical reality check. Just how does the gas collapse? It has to dissipate energy to do so, and cool to form stars. Once stars form, they may feed energy back into the surrounding gas, reheating it and potentially preventing the formation of more stars. These processes are nontrivial to compute ab initio, and attempting to do so obsesses much of the community. We don’t agree on how these things work, so they are the knobs theorists can turn to change an answer they don’t like.

Even if we don’t understand star formation in detail, we do observe that stars have formed, and can estimate how many. Moreover, we do understand pretty well how stars evolve once formed. Hence a common approach is to build stellar population models with some prescribed star formation history and see what works. Spiral galaxies like the Milky Way formed a lot of stars in the past, and continue to do so today. To make 5 x 1010 M of stars in 13 Gyr requires an average star formation rate of 4 M/yr. The current measured star formation rate of the Milky Way is estimated to be 2 ± 0.7 M/yr, so the star formation rate has been nearly constant (averaging over stochastic variations) over time, perhaps with a gradual decline. Giant elliptical galaxies, in contrast, are “red and dead”: they have no current star formation and appear to have made most of their stars long ago. Rather than a roughly constant rate of star formation, they peaked early and declined rapidly. The cessation of star formation is also called quenching.

A common way to formulate the star formation rate in galaxies as a whole is the exponential star formation rate, SFR(t) = SFR0 e-t/τ. A spiral galaxy has a low baseline star formation rate SFR0 and a long burn time τ ~ 10 Gyr while an elliptical galaxy has a high initial star formation rate and a short e-folding time like τ ~ 1 Gyr. Many variations on this theme are possible, and are of great interest astronomically, but this basic distinction suffices for our discussion here. From the perspective of the observed mass and stellar populations of local galaxies, the standard picture for a giant elliptical was a large, monolithic island universe that formed the vast majority of its stars early on then quenched with a short e-folding timescale.

Galaxies as Island Universes

The density parameter Ω provides another useful way to think about galaxy formation. As cosmologists, we obsess about the global value of Ω because it determines the expansion history and ultimate fate of the universe. Here it has a more modest application. We can think of the region in the early universe that will ultimately become a galaxy as its own little closed universe. With a density parameter Ω > 1, it is destined to recollapse.

A fun and funny fact of the Friedmann equation is that the matter density parameter Ωm → 1 at early times, so the early universe when galaxies form is matter dominated. It is also very uniform (more on that below). So any subset that is a bit more dense than average will have Ω > 1 just because the average is very close to Ω = 1. We can then treat this region as its own little universe (a “top-hat overdensity”) and use the Friedmann equation to solve for its evolution, as in this sketch:

The expansion of the early universe a(t) (blue line). A locally overdense region may behave as a closed universe, recollapsing in a finite time (red line) to potentially form a galaxy.

That’s great, right? We have a simple, analytic solution derived from first principles that explains how a galaxy forms. We can plug in the numbers to find how long it takes to form our basic, big 1011 M galaxy and… immediately encounter a problem. We need to know how overdense our protogalaxy starts out. Is its effective initial Ωm = 2? 10? What value, at what time? The higher it is, the faster the evolution from initially expanding along with the rest of the universe to decoupling from the Hubble flow to collapsing. We know the math but we still need to know the initial condition.

Annoying Initial Conditions

The initial condition for galaxy formation is observed in the cosmic microwave background (CMB) at z = 1090. Where today’s universe is remarkably lumpy, the early universe is incredibly uniform. It is so smooth that it is homogeneous and isotropic to one part in a hundred thousand. This is annoyingly smooth, in fact. It would help to have some lumps – primordial seeds with Ω > 1 – from which structure can grow. The observed seeds are too tiny; the typical initial amplitude is 10-5 so Ωm = 1.00001. That takes forever to decouple and recollapse; it hasn’t yet had time to happen.

The cosmic microwave background as observed by ESA’s Planck satellite. This is an all-sky picture of the relic radiation field – essentially a snapshot of the universe when it was just a few hundred thousand years old. The variations in color are variations in temperature which correspond to variations in density. These variations are tiny, only about one part in 100,000. The early universe was very uniform; the real picture is a boring blank grayscale. We have to crank the contrast way up to see these minute variations.

We would like to know how the big galaxies of today – enormous agglomerations of stars and gas and dust separated by inconceivably vast distances – came to be. How can this happen starting from such homogeneous initial conditions, where all the mass is equally distributed? Gravity is an attractive force that makes the rich get richer, so it will grow the slight initial differences in density, but it is also weak and slow to act. A basic result in gravitational perturbation theory is that overdensities grow at the same rate the universe expands, which is inversely related to redshift. So if we see tiny fluctuations in density with amplitude 10-5 at z = 1000, they should have only grown by a factor of 1000 and still be small today (10-2 at z = 0). But we see structures of much higher contrast than that. You can’t here from there.

The rich large scale structure we see today is impossible starting from the smooth observed initial conditions. Yet here we are, so we have to do something to goose the process. This is one of the original motivations for invoking cold dark matter (CDM). If there is a substance that does not interact with photons, it can start to clump up early without leaving too large a mark on the relic radiation field. In effect, the initial fluctuations in mass are larger, just in the invisible substance. (That’s not to say the CDM doesn’t leave a mark on the CMB; it does, but it is subtle and entirely another story.) So the idea is that dark matter forms gravitational structures first, and the baryons fall in later to make galaxies.

An illustration of the the linear growth of overdensities. Structure can grow in the dark matter (long dashed lines) with the baryons catching up only after decoupling (short dashed line). In effect, the dark matter gives structure formation a head start, nicely explaining the apparently impossible growth factor. This has been standard picture for what seems like forever (illustration from Schramm 1992).

With the right amount of CDM – and it has to be just the right amount of a dynamically cold form of non-baryonic dark matter (stuff we still don’t know actually exists) – we can explain how the growth factor is 105 since recombination instead of a mere 103. The dark matter got a head start over the stuff we can see; it looks like 105 because the normal matter lagged behind, being entangled with the radiation field in a way the dark matter was not.

This has been the imperative need in structure formation theory for so long that it has become undisputed lore; an element of the belief system so deeply embedded that it is practically impossible to question. I risk getting ahead of the story, but it is important to point out that, like the interpretation of so much of the relevant astrophysical data, this belief assumes that gravity is normal. This assumption dictates the growth rate of structure, which in turn dictates the need to invoke CDM to allow structure to form in the available time. If we drop this assumption, then we have to work out what happens in each and every alternative that we might consider. That definitely gets ahead of the story, so first let’s understand what we should expect in LCDM.

Hierarchical Galaxy formation in LCDM

LCDM predicts some things remarkably well but others not so much. The dark matter is well-behaved, responding only to gravity. Baryons, on the other hand, are messy – one has to worry about hydrodynamics in the gas, star formation, feedback, dust, and probably even magnetic fields. In a nutshell, LCDM simulations are very good at predicting the assembly of dark mass, but converting that into observational predictions relies on our incomplete knowledge of messy astrophysics. We know what the mass should be doing, but we don’t know so well how that translates to what we see. Mass good, light bad.

Starting with the assembly of mass, the first thing we learn is that the story of monolithic galaxy formation outlined above has to be wrong. Early density fluctuations start out tiny, even in dark matter. God didn’t plunk down island universes of galaxy mass then say “let there be galaxies!” The annoying initial conditions mean that little dark matter halos form first. These subsequently merge hierarchically to make ever bigger halos. Rather than top-down monolithic galaxy formation, we have the bottom-up hierarchical formation of dark matter halos.

The hierarchical agglomeration of dark matter halos into ever larger objects is often depicted as a merger tree. Here are four examples from the high resolution Illustris TNG50 simulation (Pillepich et al. 2019; Nelson et al. 2019).

Examples of merger trees from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019). Objects have been selected to have very nearly the same stellar mass at z=0. Mass is built up through a series of mergers. One large dark matter halo today (at top) has many antecedents (small halos at bottom). These merge hierarchically as illustrated by the connecting lines. The size of the symbol is proportional to the halo mass. I have added redshift and the corresponding age of the universe for vanilla LCDM in a more legible font. The color bar illustrates the specific star formation rate: the top row has objects that are still actively star forming like spirals; those in the bottom row are “red and dead” – things that have stopped forming stars, like giant elliptical galaxies. In all cases, there is a lot of merging and a modest rate of growth, with the typical object taking about half a Hubble time (~7 Gyr) to assemble half of its final stellar mass.

The hierarchical assembly of mass is generic in CDM. Indeed, it is one of its most robust predictions. Dark matter halos start small, and grow larger by a succession of many mergers. This gradual agglomeration is slow: note how tiny the dark matter halos at z = 10 are.

Strictly speaking, it isn’t even meaningful to talk about a single galaxy over the span of a Hubble time. It is hard to avoid this mental trap: surely the Milky Way has always been the Milky Way? so one imagines its evolution over time. This is monolithic thinking. Hierarchically, “the galaxy” refers at best to the largest progenitor, the object that traces the left edge of the merger trees above. But the other protogalactic chunks that eventually merge together are as much part of the final galaxy as the progenitor that happens to be largest.

This complicated picture is complicated further by what we can see being stars, not mass. The luminosity we observe forms through a combination of in situ growth (star formation in the largest progenitor) and ex situ growth through merging. There is no reason for some preferred set of protogalaxies to form stars faster than the others (though of course there is some scatter about the mean), so presumably the light traces the mass of stars formed traces the underlying dark mass. Presumably.

That we should see lots of little protogalaxies at high redshift is nicely illustrated by this lookback cone from Yung et al (2022). Here the color and size of each point corresponds to the stellar mass. Massive objects are common at low redshift but become progressively rare at high redshift, petering out at z > 4 and basically absent at z = 10. This realization of the observable stellar mass tracks the assembly of dark mass seen in merger trees.

Fig. 2 from Yung et al. (2022) illustrating what an observer would see looking back through their simulation to high redshift.

This is what we expect to see in LCDM: lots of small protogalaxies at high redshift; the building blocks of later galaxies that had not yet merged. The observation of galaxies much brighter than this at high redshift by JWST poses a fundamental challenge to the paradigm: mass appears not to be subdivided as expected. So it is entirely justifiable that people have been freaking out that what we see are bright galaxies that are apparently already massive. That shouldn’t happen; it wasn’t predicted to happen; how can this be happening?

That’s all background that is assumed knowledge for our ApJ paper, so we’re only now getting to its Figure 1. This combines one of the merger trees above with its stellar mass evolution. The left panel shows the assembly of dark mass; the right pane shows the growth of stellar mass in the largest progenitor. This is what we expect to see in observations.


Fig. 1 from McGaugh et al (2024): A merger tree for a model galaxy from the TNG50-1 simulation (Pillepich et al. 2019; Nelson et al. 2019, left panel) selected to have M ≈ 9 × 1010 M at z = 0; i.e., the stellar mass of a local L giant elliptical galaxy (Driver et al. 2022). Mass assembles hierarchically, starting from small halos at high redshift (bottom edge) with the largest progenitor traced along the left of edge of the merger tree. The growth of stellar mass of the largest progenitor is shown in the right panel. This example (jagged line) is close to the median (dashed line) of comparable mass objects (Rodriguez-Gomez et al. 2016), and within the range of the scatter (the shaded band shows the 16th – 84th percentiles). A monolithic model that forms at zf = 10 and evolves with an exponentially declining star formation rate with τ = 1 Gyr (purple line) is shown for comparison. The latter model forms most of its stars earlier than occurs in the simulation.

For comparison, we also show the stellar mass growth of a monolithic model for a giant elliptical galaxy. This is the classic picture we had for such galaxies before we realized that galaxy formation had to be hierarchical. This particular monolithic model forms at zf = 10 and follows an exponential star formation rate with τ = 1 Gyr. It is one of the models published by Franck & McGaugh (2017). It is, in fact, the first model I asked Jay to construct when he started the project. Not because we expected it to best describe the data, as it turns out to do, but because the simple exponential model is a touchstone of stellar population modeling. It was a starter model: do this basic thing first to make sure you’re doing it right. We chose τ = 1 Gyr because that was the typical number bandied about for elliptical galaxies, and zf = 10 because that seemed ridiculously early for a massive galaxy to form. At the time we built the model, it was ludicrously early to imagine a massive galaxy would form, from an LCDM perspective. A formation redshift zf = 10 was, less than a decade ago, practically indistinguishable from the beginning of time, so we expected it to provide a limit that the data would not possibly approach.

In a remarkably short period, JWST has transformed z = 10 from inconceivable to run of the mill. I’m not going to go into the data yet – this all-theory post is already a lot – but to offer one spoiler: the data are consistent with this monolithic model. If we want to “fix” LCDM, we have to make the red line into the purple line for enough objects to explain the data. That proves to be challenging. But that’s moving the goalposts; the prediction was that we should see little protogalaxies at high redshift, not massive, monolith-style objects. Just look at the merger trees at z = 10!

Accelerated Structure Formation in MOND

In order to address these issues in MOND, we have to go back to the beginning. What is the evolution of a spherical region (a top-hat overdensity) that might collapse to form a galaxy? How does a spherical region under the influence of MOND evolve within an expanding universe?

The solution to this problem was first found by Felten (1984), who was trying to play the Newtonian cosmology trick in MOND. In conventional dynamics, one can solve the equation of motion for a point on the surface of a uniform sphere that is initially expanding and recover the essence of the Friedmann equation. It was reasonable to check if cosmology might be that simple in MOND. It was not. The appearance of a0 as a physical scale makes the solution scale-dependent: there is no general solution that one can imagine applies to the universe as a whole.

Felten reasonably saw this as a failure. There were, however, some appealing aspects of his solution. For one, there was no such thing as a critical density. All MOND universes would eventually recollapse irrespective of their density (in the absence of the repulsion provided by a cosmological constant). It could take a very long time, which depended on the density, but the ultimate fate was always the same. There was no special value of Ω, and hence no flatness problem. The latter obsessed people at the time, so I’m somewhat surprised that no one seems to have made this connection. Too soon*, I guess.

There it sat for many years, an obscure solution for an obscure theory to which no one gave credence. When I became interested in the problem a decade later, I started methodically checking all the classic results. I was surprised to find how many things we needed dark matter to explain were just as well (or better) explained by MOND. My exact quote was “surprised the bejeepers out of us.” So, what about galaxy formation?

I started with the top-hat overdensity, and had the epiphany that Felten had already obtained the solution. He had been trying to solve all of cosmology, which didn’t work. But he had solved the evolution of a spherical region that starts out expanding with the rest of the universe but subsequently collapses under the influence of MOND. The overdensity didn’t need to be large, it just needed to be in the low acceleration regime. Something like the red cycloidal line in the second plot above could happen in a finite time. But how much?

The solution depends on scale and needs to be solved numerically. I am not the greatest programmer, and I had a lot else on my plate at the time. I was in no rush, as I figured I was the only one working on it. This is usually a good assumption with MOND, but not in this case. Bob Sanders had had the same epiphany around the same time, which I discovered when I received his manuscript to referee. So all credit is due to Bob: he said these things first.

First, he noted that galaxy formation in MOND is still hierarchical. Small things form first. Crudely speaking, structure formation is very similar to the conventional case, but now the goose comes from the change in the force law rather than extra dark mass. MOND is nonlinear, so the whole process gets accelerated. To compare with the linear growth of CDM:

A sketch of how structures grow over time under the influence of cold dark matter (left, from Schramm 1992, same as above) and MOND (right, from Sanders & McGaugh 2002; see also this further discussion and previous post). The slow linear growth of CDM (long-dashed line, left panel) is replaced by a rapid, nonlinear growth in MOND (solid lines at right; numbers correspond to different scales). Nonlinear growth moderates after cosmic expansion begins to accelerate (dashed vertical line in right panel).

The net effect is the same. A cosmic web of large scale structure emerges. They look qualitatively similar, but everything happens faster in MOND. This is why observations have persistently revealed structures that are more massive and were in place earlier than expected in contemporaneous LCDM models.

Simulated structure formation in ΛCDM (top) and MOND (bottom) showing the more rapid emergence of similar structures in MOND (note the redshift of each panel). From McGaugh (2015).

In MOND, small objects like globular clusters form first, but galaxies of a range of masses all collapse on a relatively short cosmic timescale. How short? Let’s consider our typical 1011 M galaxy. Solving Felten’s equation for the evolution of a sphere numerically, peak expansion is reached after 300 Myr and collapse happens in a similar time. The whole galaxy is in place speedy quick, and the initial conditions don’t really matter: a uniform, initially expanding sphere in the low acceleration regime will behave this way. From our distant vantage point thirteen billion years later, the whole process looks almost monolithic (the purple line above) even though it is a chaotic hierarchical mess for the first few hundred million years (z > 14). In particular, it is easy to form half of the stellar mass early on: the mass is already assembled.

The evolution of a 1011 M sphere that starts out expanding with the universe but decouples and collapses under the influence of MOND (dotted line). It reaches maximum expansion after 300 Myr and recollapses in a similar time, so the entire object is in place after 600 Myr. (A version of this plot with a logarithmic time axis appears as Fig. 2 in our paper.) The inset shows the evolution of smaller shells within such an object (Fig. 2 from Sanders 2008). The inner regions collapse first followed by outer shells. These oscillate and cross, mixing and ultimately forming a reasonable size galaxy – see Sanders’s Table 1 and also his Fig. 4 for the collapse times for objects of other masses. These early results are corroborated by Eappen et al. (2022), who further demonstrate that the details of feedback are not important in MOND, unlike LCDM.

This is what JWST sees: galaxies that are already massive when the universe is just half a billion years old. I’m sure I should say more but I’m exhausted now and you may be too, so I’m gonna stop here by noting that in 1998, when Bob Sanders predicted that “Objects of galaxy mass are the first virialized objects to form (by z=10),” the contemporaneous prediction of LCDM was that “present-day disc [galaxies] were assembled recently (at z<=1)” and “there is nothing above redshift 7.” One of these predictions has been realized. It is rare in science that such a clear a priori prediction comes true, let alone one that seemed so unreasonable at the time, and which took a quarter century to corroborate.


*I am not quite this old: I was still an undergraduate in 1984. I hadn’t even decided to be an astronomer at that point; I certainly hadn’t started following the literature. The first time I heard of MOND was in a graduate course taught by Doug Richstone in 1988. He only mentioned it in passing while talking about dark matter, writing the equation on the board and saying maybe it could be this. I recall staring at it for a long few seconds, then shaking my head and muttering “no way.” I then completely forgot about it, not thinking about it again until it came up in our data for low surface brightness galaxies. I expect most other professionals have the same initial reaction, which is fair. The test of character comes when it crops up in their data, as it is doing now for the high redshift galaxy community.

The Radial Acceleration Relation to very low accelerations

The Radial Acceleration Relation to very low accelerations

Flat rotation curves and the Baryonic Tully-Fisher relation (BTFR) both follow from the Radial Acceleration Relation (RAR). In Mistele et al. (2024b) we emphasize the exciting aspects of the former; these follow from the RAR in the Mistele et al. (2024a). It is worth understanding the connection.

First, the basic result:


Figure 2 from Mistele et al. (2024a). The RAR from weak lensing data (yellow diamonds) is shown together with the binned kinematic RAR from Lelli et al. (2017, gray circles). The solid line is Newtonian gravity without dark matter (gobs = gbar). The shaded region at gbar < 10−13 m/s2 indicates where the isolation criterion may be less reliable according to the estimate by Brouwer et al. (2021). Our results suggest that late type galaxies (LTGs) may be sufficiently isolated down to gbar ≈ 10−14 m/s2. We shade this region where LTGs may still be reliable in a lighter color.

The RAR of weak lensing extends the RAR from kinematics to much lower accelerations. How low we can trust we’ll come back to, but certainly to gbar ≈ 10−13 m/s2 and probably to gbar ≈ 10−14 m/s2. For the mass of the typical galaxy in the KiDS sample, this corresponds to a radius of 300 kpc and 1.1 Mpc, respectively. Hence our claim that the effective gravitational potentials of isolated galaxies are consistent with rotation curves that remain flat indefinitely far out: a million light years at least, and perhaps a million parsecs.

Note that the kinematic and lensing data overlap at log(gbar) = -11.5. These independent methods give the same result. Moreover, this region corresponds to the regions in galaxies where atomic gas rather than stars dominates the baryonic mass budget, which minimizes the systematic uncertainty due to stellar population mass estimates. The lensing results still depend on these, but they agree with the gas-dominated portion of the RAR, and merge smoothly into the star-dominated portion of the kinematic data when the same stellar pop models are used for both. To wit: the agreement is really good.

A flat rotation curve projects into the log(gobs)-log(gbar) plane as a line with slope 1/2. The data adhere closely to this slope, so I knew as soon as I saw the lensing RAR that the implied rotation curves remained flat indefinitely. How far, in radius, depends on galaxy mass, since for a point mass (a good approximation at radii beyond 100 kpc), gbar = GMbar/R2. We can split the lensing data into different mass bins, for which the RAR looks like


Figure 5 from Mistele et al. (2024a). The RAR implied by weak lensing for four baryonic mass bins. The dashed line has the slope a flat rotation curve has when projected into the acceleration plane. That different masses follow the same RAR implies the Baryonic Tully-Fisher relation.

Most dark matter models that I’ve seen or constructed myself predict a mass-dependent shift in the RAR, if they predict a RAR at all (many do not). We see no such shift. But the math is such that the flat rotation speed implied by the slope 1/2 RAR varies with mass in such a way that they only fall on the same RAR, as observed, if there is a Baryonic Tully-Fisher relation with slope 4. So I knew from examination of the above figure that the BTFR was sure to follow, but that’s because I’ve been working on these things for a long time. It isn’t necessarily obvious to everyone else, so it was worth explicitly showing.

Our result differs from the original of Brouwer et al. in two subtle but important ways. The first is that we use stellar population models that are the same as we use for the kinematic data. This self-consistency is important to the continuity of the data. We (especially Jim Schombert) took a deep dive into this, and the models used by Brouwer et al. are consistent with ours for late type (spiral) galaxies (LTGs). However, ours are somewhat heavier^ for early type galaxies (ETGs). That’s part of the reason that they find an offset in the RAR between morphological types and we do not.

Another important difference is the strictness of the isolation criterion. We are trying to ascertain the average gravitational potential of isolated galaxies, those with no big neighbors to compound the lensing signal. Brouwer et al. required that there be no galaxies more than a tenth of the luminosity of the primary within 3 Mpc. That seems reasonable, but we explored lots of variations on both aspects of that limit. It seems to be fine for LTGs, but insufficient for ETGs. That in itself is not surprising, as ETGs are known to be more strongly clustered than LTGs, so it is harder to find isolated examples.

To illustrate this, we show the deviation of the data from the kinematic RAR fit as a function of the isolation criterion:


Figure 4 from Mistele et al. (2024a). Top: the difference between the radial accelerations inferred from weak lensing and the RAR fitting function, measured in sigmas, as a function of how isolated the lenses are, quantified by Risol. We separately show the result for ETGs (red) and LTGs (blue) as well as for small (triangles with dashed lines) and large accelerations (diamonds with solid lines). LTGs are mostly unaffected by making the isolation criterion stricter. In contrast, ETGs do depend on Risol, but tend towards with increasing Risol. Middle and bottom: the accelerations behind these sigma values for Risol = 3 Mpc/h70 and Risol = 4 Mpc/h70
.

The top panel shows that LTGs do not deviate from the RAR as we vary the radius of isolation. In contrast, ETGs deviate a lot for small Risol. This is what Brouwer et al. found, and it would be a problem for MOND if LTGs and ETGs genuinely formed different sequences: it would be as if they were both obeying their own version of a similar but distinct MOND-like force law rather than a single universal force law.

That said, the ETGs converge towards the same RAR as the LTGs as we make the isolation criterion more strict. The distinction between ETGs and LTGs that appears to be clear for the Risol = 3 Mpc/h70 used by Brouwer et al. (middle panel) goes away when Risol = 4 Mpc/h70 (bottom panel). The random errors grow because fewer galaxies+ meet the stricter criterion, but this seems a price well worth paying to be rid of the systematic variation seen in the top panel. This also dictates how far out we can trust the data, which show no clear deviation from the RAR until below the limit gbar = 10−14 m/s2.

Regardless of the underlying theory, the data paint a consistent picture. This can be summarized by three empirical laws of galactic rotation:

  • Rotation curves become approximately* flat at large radii and remain so indefinitely.
  • The amplitude of the flat rotation speed scales with the baryonic mass as Mbar ~ Vf4 (the BTFR).
  • The observed centripetal acceleration follows from that predicted by the baryons (the RAR).

These are the galactic analogs of Kepler’s Laws for planetary motion. There is no theory in these statements; their just a description of what the data do. That’s useful, as they provide an empirical touchstone that has to be satisfactorily explained by any theory for it to be considered viable. No dark matter-based theory currently does that.


^The difference is well within the expected variance for stellar population models. We can reproduce their numbers if we treat ETGs as if they were just red LTGs. I don’t know if that’s what they did, but it ain’t right.

+For the record, the isolated fraction of the entire sample is 16%: most galaxies have neighbors. As a function of mass, the isolation criterion leaves a fraction of 8%, 18%, 30%, and 42% of LTG lenses and 9%, 14%, and 22% of ETG lenses, respectively, in each mass bin. The fraction of isolated LTGs is generally higher than ETGs, as expected. There is also a trend for the isolation fraction to increase as mass decreases. In part this is real; more luminous galaxies are more clustered. It may also be that it is easier for objects that exceed 10% of the primary mass (really luminosity) to evade detection as the primaries get fainter so 10% of that is harder to reach.

*Some people take “flat” way too seriously in this context. While it is often true that rotation curves look pretty darn flat over an extended radial range, I say approximately flat because we never measure, and can never measure, exactly a slope of dV/dR = 0.000. As a practical matter, we have adopted a variation of < 5% from point to point as a working definition. The scatter in Tully-Fisher naturally goes up if one adopts a weaker criterion; what one gets for the scatter is all about data quality.

By the wayside

By the wayside

I noted last time that in the rush to analyze the first of the JWST data, that “some of these candidate high redshift galaxies will fall by the wayside.” As Maurice Aabe notes in the comments there, this has already happened.

I was concerned because of previous work with Jay Franck in which we found that photometric redshifts were simply not adequately precise to identify the clusters and protoclusters we were looking for. Consequently, we made it a selection criterion when constructing the CCPC to require spectroscopic redshifts. The issue then was that it wasn’t good enough to have a rough idea of the redshift, as the photometric method often provides (what exactly it provides depends in a complicated way on the redshift range, the stellar population modeling, and the wavelength range covered by the observational data that is available). To identify a candidate protocluster, you want to know that all the potential member galaxies are really at the same redshift.

This requirement is somewhat relaxed for the field population, in which a common approach is to ask broader questions of the data like “how many galaxies are at z ~ 6? z ~ 7?” etc. Photometric redshifts, when done properly, ought to suffice for this. However, I had noticed in Jay’s work that there were times when apparently reasonable photometric redshift estimates went badly wrong. So it made the ganglia twitch when I noticed that in early JWST work – specifically Table 2 of the first version of a paper by Adams et al. – there were seven objects with candidate photometric redshifts, and three already had a preexisting spectroscopic redshift. The photometric redshifts were mostly around z ~ 9.7, but the three spectroscopic redshifts were all smaller: two z ~ 7.6, one 8.5.

Three objects are not enough to infer a systematic bias, so I made a mental note and moved on. But given our previous experience, it did not inspire confidence that all the available cases disagreed, and that all the spectroscopic redshifts were lower than the photometric estimates. These things combined to give this observer a serious case of “the heebie-jeebies.”

Adams et al have now posted a revised analysis in which many (not all) redshifts change, and change by a lot. Here is their new Table 4:

Table 4 from Adams et al. (2022, version 2).

There are some cases here that appear to confirm and improve the initial estimate of a high redshift. For example, SMACS-z11e had a very uncertain initial redshift estimate. In the revised analysis, it is still at z~11, but with much higher confidence.

That said, it is hard to put a positive spin on these numbers. 23 of 31 redshifts change, and many change drastically. Those that change all become smaller. The highest surviving redshift estimate is z ~ 15 for SMACS-z16b. Among the objects with very high candidate redshifts, some are practically local (e.g., SMACS-z12a, F150DB-075, F150DA-058).

So… I had expected that this could go wrong, but I didn’t think it would go this wrong. I was concerned about the photometric redshift method – how well we can model stellar populations, especially at young ages dominated by short lived stars that in the early universe are presumably lower metallicity than well-studied nearby examples, the degeneracies between galaxies at very different redshifts but presenting similar colors over a finite range of observed passbands, dust (the eternal scourge of observational astronomy, expected to be an especially severe affliction in the ultraviolet that gets redshifted into the near-IR for high-z objects, both because dust is very efficient at scattering UV photons and because this efficiency varies a lot with metallicity and the exact gran size distribution of the dust), when is a dropout really a dropout indicating the location of the Lyman break and when is it just a lousy upper limit of a shabby detection, etc. – I could go on, but I think I already have. It will take time to sort these things out, even in the best of worlds.

We do not live in the best of worlds.

It appears that a big part of the current uncertainty is a calibration error. There is a pipeline for handling JWST data that has an in-built calibration for how many counts in a JWST image correspond to what astronomical magnitude. The JWST instrument team warned us that the initial estimate of this calibration would “improve as we go deeper into Cycle 1” – see slide 13 of Jane Rigby’s AAS presentation.

I was not previously aware of this caveat, though I’m certainly not surprised by it. This is how these things work – one makes an initial estimate based on the available data, and one improves it as more data become available. Apparently, JWST is outperforming its specs, so it is seeing as much as 0.3 magnitudes deeper than anticipated. This means that people were inferring objects to be that much too bright, hence the appearance of lots of galaxies that seem to be brighter than expected, and an apparent systematic bias to high z for photometric redshift estimators.

I was not at the AAS meeting, let alone Dr. Rigby’s presentation there. Even if I had been, I’m not sure I would have appreciated the potential impact of that last bullet point on nearly the last slide. So I’m not the least bit surprised that this error has propagated into the literature. This is unfortunate, but at least this time it didn’t lead to something as bad as the Challenger space shuttle disaster in which the relevant warning from the engineers was reputed to have been buried in an obscure bullet point list.

So now we need to take a deep breath and do things right. I understand the urgency to get the first exciting results out, and they are still exciting. There are still some interesting high z candidate galaxies, and lots of empirical evidence predating JWST indicating that galaxies may have become too big too soon. However, we can only begin to argue about the interpretation of this once we agree to what the facts are. At this juncture, it is more important to get the numbers right than to post early, potentially ill-advised takes on arXiv.

That said, I’d like to go back to writing my own ill-advised take to post on arXiv now.

A brief history of the Radial Acceleration Relation

A brief history of the Radial Acceleration Relation

In science, all new and startling facts must encounter in sequence the responses

1. It is not true!

2. It is contrary to orthodoxy.

3. We knew it all along.

Louis Agassiz (circa 1861)

This expression exactly depicts the progression of the radial acceleration relation. Some people were ahead of this curve, others are still behind it, but it quite accurately depicts the mass sociology. This is how we react to startling new facts.

For quotation purists, I’m not sure exactly what the original phrasing was. I have paraphrased it to be succinct and have substituted orthodoxy for religion, because even scientists can have orthodoxies: holy cows that must not be slaughtered.

I might even add a precursor stage zero to the list above:

0. It goes unrecognized.

This is to say, that if a new fact is sufficiently startling, we don’t just disbelieve it (stage 1); at first we fail to see it at all. We lack the cognitive framework to even recognize how important it is. An example is provided by the 1941 detection of the microwave background by Andrew McKellar. In retrospect, this is as persuasive as the 1964 detection of Penzias and Wilson to which we usually ascribe the discovery. At the earlier time, there was simply no framework for recognizing what it was that was being detected. It appears to me that P&Z didn’t know what they were looking at either until Peebles explained it to them.

The radial acceleration relation was first posed as the mass discrepancy-acceleration relation. They’re fundamentally the same thing, just plotted in a slightly different way. The mass discrepancy-acceleration relation shows the ratio of total mass to that which is visible. This is basically the ratio of the observed acceleration to that predicted by the observed baryons. This is useful to see how much dark matter is needed, but by construction the axes are not independent, as both measured quantities are used in forming the ratio.

The radial acceleration relation shows independent observations along each axis: observed vs. predicted acceleration. Though measured independently, they are not physically independent, as the baryons contribute some to the total observed acceleration – they do have mass, after all. One can construct a halo acceleration relation by subtracting the baryonic contribution away from the total; in principle the remainders are physically independent. Unfortunately, the axes again become observationally codependent, and the uncertainties blow up, especially in the baryon dominated regime. Which of these depictions is preferable depends a bit on what you’re looking to see; here I just want to note that they are the same information packaged somewhat differently.

To the best of my knowledge, the first mention of the mass discrepancy-acceleration relation in the scientific literature is by Sanders (1990). Its existence is explicit in MOND (Milgrom 1983), but here it is possible to draw a clear line between theory and data. I am only speaking of the empirical relation as it appears in the data, irrespective of anything specific to MOND.

I met Bob Sanders, along with many other talented scientists, in a series of visits to the University of Groningen in the early 1990s. Despite knowing him and having talked to him about rotation curves, I was unaware that he had done this.

Stage 0: It goes unrecognized.

For me, stage one came later in the decade at the culmination of a several years’ campaign to examine the viability of the dark matter paradigm from every available perspective. That’s a long paper, which nevertheless drew considerable praise from many people who actually read it. If you go to the bother of reading it today, you will see the outlines of many issues that are still debated and others that have been forgotten (e.g., the fine-tuning issues).

Around this time (1998), the dynamicists at Rutgers were organizing a meeting on galaxy dynamics, and asked me to be one of the speakers. I couldn’t possibly discuss everything in the paper in the time allotted, so was looking for a way to show the essence of the challenge the data posed. Consequently, I reinvented the wheel, coming up with the mass discrepancy-acceleration relation. Here I show the same data that I had then in the form of the radial acceleration relation:

The Radial Acceleration Relation from the data in McGaugh (1999). Plot credit: Federico Lelli. (There is a time delay in publication: the 1998 meeting’s proceedings appeared in 1999.)

I recognize this version of the plot as having been made by Federico Lelli. I’ve made this plot many times, but this is version I came across first, and it is better than mine in that the opacity of the points illustrates where the data are concentrated. I had been working on low surface brightness galaxies; these have low accelerations, so that part of the plot is well populated.

The data show a clear correlation. By today’s standards, it looks crude. Going on what we had then, it was fantastic. Correlations practically never look this good in extragalactic astronomy, and they certainly don’t happen by accident. Low quality data can hide a correlation – uncertainties cause scatter – but they can’t create a correlation where one doesn’t exist.

This result was certainly startling if not as new as I then thought. That’s why I used the title How Galaxies Don’t Form. This was contrary to our expectations, as I had explained in exhaustive detail in the long paper and revisit in a recent review for philosophers and historians of science.

I showed the same result later that year (1998) at a meeting on the campus of the University of Maryland where I was a brand new faculty member. It was a much shorter presentation, so I didn’t have time to justify the context or explain much about the data. Contrary to the reception at Rutgers where I had adequate time to speak, the hostility of the audience to the result was palpable, their stony silence eloquent. They didn’t want to believe it, and plenty of people got busy questioning the data.

Stage 1: It is not true.

I spent the next five years expanding and improving the data. More rotation curves became available thanks to the work of many, particularly Erwin de Blok, Marc Verheijen, and Rob Swaters. That was great, but the more serious limitation was how well we could measure the stellar mass distribution needed to predict the baryonic acceleration.

The mass models we could build at the time were based on optical images. A mass model takes the observed light distribution, assigns a mass-to-light ratio, and makes a numerical solution of the Poisson equation to obtain the the gravitational force corresponding to the observed stellar mass distribution. This is how we obtain the stellar contribution to the predicted baryonic force; the same procedure is applied to the observed gas distribution. The blue part of the spectrum is the best place in which to observe low contrast, low surface brightness galaxies as the night sky is darkest there, at least during new moon. That’s great for measuring the light distribution, but what we want is the stellar mass distribution. The mass-to-light ratio is expected to have a lot of scatter in the blue band simply from the happenstance of recent star formation, which makes bright blue stars that are short-lived. If there is a stochastic uptick in the star formation rate, then the mass-to-light ratio goes down because there are lots of bright stars. Wait a few hundred million years and these die off, so the mass-to-light ratio gets bigger (in the absence of further new star formation). The time-integrated stellar mass may not change much, but the amount of blue light it produces does. Consequently, we expect to see well-observed galaxies trace distinct lines in the radial acceleration plane, even if there is a single universal relation underlying the phenomenon. This happens simply because we expect to get M*/L wrong from one galaxy to the next: in 1998, I had simply assumed all galaxies had the same M*/L for lack of any better prescription. Clearly, a better prescription was warranted.

In those days, I traveled through Tucson to observe at Kitt Peak with some frequency. On one occasion, I found myself with a few hours to kill between coming down from the mountain and heading to the airport. I wandered over to the Steward Observatory at the University of Arizona to see who I might see. A chance meeting in the wild west: I encountered Eric Bell and Roelof de Jong, who were postdocs there at the time. I knew Eric from his work on the stellar populations of low surface brightness galaxies, an interest closely aligned with my own, and Roelof from my visits to Groningen.

As we got to talking, Eric described to me work they were doing on stellar populations, and how they thought it would be possible to break the age-metallicity degeneracy using near-IR colors in addition to optical colors. They were mostly focused on improving the age constraints on stars in LSB galaxies, but as I listened, I realized they had constructed a more general, more powerful tool. At my encouragement (read their acknowledgements), they took on this more general task, ultimately publishing the classic Bell & de Jong (2001). In it, they built a table that enabled one to look up the expected mass-to-light ratio of a complex stellar population – one actively forming stars – as a function of color. This was a big step forward over my educated guess of a constant mass-to-light ratio: there was now a way to use a readily observed property, color, to improve the estimated M*/L of each galaxy in a well-calibrated way.

Combining the new stellar population models with all the rotation curves then available, I obtained an improved mass discrepancy-acceleration relation:

The Radial Acceleration Relation from the data in McGaugh (2004); version using Bell’s stellar population synthesis models to estimate M*/L (see Fig. 5 for other versions). Plot credit: Federico Lelli.

Again, the relation is clear, but with scatter. Even with the improved models of Bell & de Jong, some individual galaxies have M*/L that are wrong – that’s inevitable in this game. What you cannot know is which ones! Note, however, that there are now 74 galaxies in this plot, and almost all of them fall on top of each other where the point density is large. There are some obvious outliers; those are presumably just that: the trees that fall outside the forest because of the expected scatter in M*/L estimates.

I tried a variety of prescriptions for M*/L in addition to that of Bell & de Jong. Though they differed in texture, they all told a consistent story. A relation was clearly present; only its detailed form varied with the adopted prescription.

The prescription that minimized the scatter in the relation was the M*/L obtained in MOND fits. That’s a tautology: by construction, a MOND fit finds the M*/L that puts a galaxy on this relation. However, we can generalize the result. Maybe MOND is just a weird, unexpected way of picking a number that has this property; it doesn’t have to be the true mass-to-light ratio in nature. But one can then define a ratio Q

Equation 21 of McGaugh (2004).

that relates the “true” mass-to-light ratio to the number that gives a MOND fit. They don’t have to be identical, but MOND does return M*/L that are reasonable in terms of stellar populations, so Q ~ 1. Individual values could vary, and the mean could be a bit more or less than unity, but not radically different. One thing that impressed me at the time about the MOND fits (most of which were made by Bob Sanders) was how well they agreed with the stellar population models, recovering the correct amplitude, the correct dependence on color in different bandpasses, and also giving the expected amount of scatter (more in the blue than in the near-IR).

Fig. 7 of McGaugh (2004). Stellar mass-to-light ratios of galaxies in the blue B-band (top) and near-IR K-band (bottom) as a function of BV color for the prescription of maximum disk (left) and MOND (right). Each point represents one galaxy for which the requisite data were available at the time. The line represents the mean expectation of stellar population synthesis models from Bell et al. (2003). These lines are completely independent of the data: neither the normalization nor the slope has been fit to the dynamical data. The red points are due to Sanders & Verheijen (1998); note the weak dependence of M*/L on color in the near-IR.

The obvious interpretation is that we should take seriously a theory that obtains good fits with a single free parameter that checks out admirably well with independent astrophysical constraints, in this case the M*/L expected for stellar populations. But I knew many people would not want to do that, so I defined Q to generalize to any M*/L in any (dark matter) context one might want to consider.

Indeed, Q allows us to write a general expression for the rotation curve of the dark matter halo (essentially the HAR alluded to above) in terms of that of the stars and gas:

Equation 22 of McGaugh (2004).

The stars and the gas are observed, and μ is the MOND interpolation function assumed in the fit that leads to Q. Except now the interpolation function isn’t part of some funny new theory; it is just the shape of the radial acceleration relation – a relation that is there empirically. The only fit factor between these data and any given model is Q – a single number of order unity. This does leave some wiggle room, but not much.

I went off to a conference to describe this result. At the 2006 meeting Galaxies in the Cosmic Web in New Mexico, I went out of my way at the beginning of the talk to show that even if we ignore MOND, this relation is present in the data, and it provides a strong constraint on the required distribution of dark matter. We may not know why this relation happens, but we can use it, modulo only the modest uncertainty in Q.

Having bent over backwards to distinguish the data from the theory, I was disappointed when, immediately at the end of my talk, prominent galaxy formation theorist Anatoly Klypin loudly shouted

“We don’t have to explain MOND!”

It stinks of MOND!

But you do have to explain the data. The problem was and is that the data look like MOND. It is easy to conflate one with the other; I have noticed that a lot of people have trouble keeping the two separate. Just because you don’t like the theory doesn’t mean that the data are wrong. What Anatoly was saying was that

2. It is contrary to orthodoxy.

Despite phrasing the result in a way that would be useful to galaxy formation theorists, they did not, by and large, claim to explain it at the time – it was contrary to orthodoxy so didn’t need to be explained. Looking at the list of papers that cite this result, the early adopters were not the target audience of galaxy formation theorists, but rather others citing it to say variations of “no way dark matter explains this.”

At this point, it was clear to me that further progress required a better way to measure the stellar mass distribution. Looking at the stellar population models, the best hope was to build mass models from near-infrared rather than optical data. The near-IR is dominated by old stars, especially red giants. Galaxies that have been forming stars actively for a Hubble time tend towards a quasi-equilibrium in which red giants are replenished by stellar evolution at about the same rate they move on to the next phase. One therefore expects the mass-to-light ratio to be more nearly constant in the near-IR. Not perfectly so, of course, but a 2 or 3 micron image is as close to a map of the stellar mass of a galaxy as we’re likely to get.

Around this time, the University of Maryland had begun a collaboration with Kitt Peak to build a big infrared camera, NEWFIRM, for the 4m telescope. Rob Swaters was hired to help write software to cope with the massive data flow it would produce. The instrument was divided into quadrants, each of which had a field of view sufficient to hold a typical galaxy. When it went on the telescope, we developed an efficient observing method that I called “four-shooter”, shuffling the target galaxy from quadrant to quadrant so that in processing we could remove the numerous instrumental artifacts intrinsic to its InSb detectors. This eventually became one of the standard observing modes in which the instrument was operated.

NEWFIRM in the lab in Tucson. Most of the volume is for cryogenics: the IR detectors are heliumcooled to 30 K. Partial student for scale.

I was optimistic that we could make rapid progress, and at first we did. But despite all the work, despite all the active cooling involved, we were still on the ground. The night sky was painfully bright in the IR. Indeed, the thermal component dominated, so we could observe during full moon. To an observer of low surface brightness galaxies attuned to any hint of scattered light from so much as a crescent moon, I cannot describe how discombobulating it was to walk outside the dome and see the full fricking moon. So bright. So wrong. And that wasn’t even the limiting factor: the thermal background was.

We had hit a surface brightness wall, again. We could do the bright galaxies this way, but the LSBs that sample the low acceleration end of the radial acceleration relation were rather less accessible. Not inaccessible, but there was a better way.

The Spitzer Space Telescope was active at this time. Jim Schombert and I started winning time to observe LSB galaxies with it. We discovered that space is dark. There was no atmosphere to contend with. No scattered light from the clouds or the moon or the OH lines that afflict that part of the sky spectrum. No ground-level warmth. The data were fantastic. In some sense, they were too good: the biggest headache we faced was blotting out all the background galaxies that shown right through the optically thin LSB galaxies.

Still, it took a long time to collect and analyze the data. We were starting to get results by the early-teens, but it seemed like it would take forever to get through everything I hoped to accomplish. Fortunately, when I moved to Case Western, I was able to hire Federico Lelli as a postdoc. Federico’s involvement made all the difference. After many months of hard, diligent, and exacting work, he constructed what is now the SPARC database. Finally all the elements were in place to construct an empirical radial acceleration relation with absolutely minimal assumptions about the stellar mass-to-light ratio.

In parallel with the observational work, Jim Schombert had been working hard to build realistic stellar population models that extended to the 3.6 micron band of Spitzer. Spitzer had been built to look redwards of this, further into the IR. 3.6 microns was its shortest wavelength passband. But most models at the time stopped at the K-band, the 2.2 micron band that is the reddest passband that is practically accessible from the ground. They contain pretty much the same information, but we still need to calculate the band-specific value of M*/L.

Being a thorough and careful person, Jim considered not just the star formation history of a model stellar population as a variable, and not just its average metallicity, but also the metallicity distribution of its stars, making sure that these were self-consistent with the star formation history. Realistic metallicity distributions are skewed; it turn out that this subtle effect tends to counterbalance the color dependence of the age effect on M*/L in the near-IR part of the spectrum. The net results is that we expect M*/L to be very nearly constant for all late type galaxies.

This is the best possible result. To a good approximation, we expected all of the galaxies in the SPARC sample to have the same mass-to-light ratio. What you see is what you get. No variable M*/L, no equivocation, just data in, result out.

We did still expect some scatter, as that is an irreducible fact of life in this business. But even that we expected to be small, between 0.1 and 0.15 dex (roughly 25 – 40%). Still, we expected the occasional outlier, galaxies that sit well off the main relation just because our nominal M*/L didn’t happen to apply in that case.

One day as I walked past Federico’s office, he called for me to come look at something. He had plotted all the data together assuming a single M*/L. There… were no outliers. The assumption of a constant M*/L in the near-IR didn’t just work, it worked far better than we had dared to hope. The relation leapt straight out of the data:

The Radial Acceleration Relation from the data in McGaugh et al. (2016). Plot credit: Federico Lelli.

Over 150 galaxies, with nearly 2700 resolved measurements within each galaxy, each with their own distinctive mass distribution, all pile on top of each other without effort. There was plenty of effort in building the database, but once it was there, the result appeared, no muss, no fuss. No fitting or fiddling. Just the measurements and our best estimate of the mean M*/L, applied uniformly to every individual galaxy in the sample. The scatter was only 0.12 dex, within the range expected from the population models.

No MOND was involved in the construction of this relation. It may look like MOND, but we neither use MOND nor need it in any way to see the relation. It is in the data. Perhaps this is the sort of result for which we would have to invent MOND if it did not already exist. But the dark matter paradigm is very flexible, and many papers have since appeared that claim to explain the radial acceleration relation. We have reached

3. We knew it all along.

On the one hand, this is good: the community is finally engaging with a startling fact that has been pointedly ignored for decades. On the other hand, many of the claims to explain the radial acceleration relation are transparently incorrect on their face, being nothing more than elaborations of models I considered and discarded as obviously unworkable long ago. They do not provide a satisfactory explanation of the predictive power of MOND, and inevitably fail to address important aspects of the problem, like disk stability. Rather than grapple with the deep issues the new and startling fact poses, it has become fashionable to simply assert that one’s favorite model explains the radial acceleration relation, and does so naturally.

There is nothing natural about the radial acceleration relation in the context of dark matter. Indeed, it is difficult to imagine a less natural result – hence stages one and two. So on the one hand, I welcome the belated engagement, and am willing to consider serious models. On the other hand, if someone asserts that this is natural and that we expected it all along, then the engagement isn’t genuine: they’re just fooling themselves.

Early Days. This was one of Vera Rubin’s favorite expressions. I always had a hard time with it, as many things are very well established. Yet it seems that we have yet to wrap our heads around the problem. Vera’s daughter, Judy Young, once likened the situation to the parable of the blind men and the elephant. Much is known, yes, but the problem is so vast that each of us can perceive only a part of the whole, and the whole may be quite different from the part that is right before us.

So I guess Vera is right as always: these remain Early Days.

A Stellar Population Mystery in a Low Surface Brightness Galaxy

A Stellar Population Mystery in a Low Surface Brightness Galaxy

“Galaxies are made of stars.”

Bob Schommer, quoted by Dave Silva in his dissertation on stellar populations

This tongue-in-cheek quote is a statement of the obvious, at least for the 90+ years since Hubble established that galaxies are stellar systems comparable to and distinct from the Milky Way. There’s interstellar gas and dust too, and I suppose for nearly half that time, people have also thought galaxies to be composed of dark matter. But you can’t see that; the defining characteristic of galaxies is the stars by whose amalgamated light they shine.

The spiral galaxy NGC 7757 (left) and a patch of adjacent sky (right). Both images are 1/4 degree on a side. Most of the sky looks like the patch at right, populated only by scattered stars. You know a galaxy when you see one. These images are based on photographic data obtained using the Oschin Schmidt Telescope on Palomar Mountain as part of the Palomar Observatory Sky Survey-II (POSS-II).

Stellar populations is the term astronomers use to describe the generations of stars that compose the galaxies we observe. The concept was introduced by Walter Baade in a 1944 paper in which he resolved individual stars in Andromeda and companion galaxies, aided by war time blackouts. He noted that some of the stars he resolved had color-magnitude diagrams (CMDs – see below) that resembled that of the solar neighborhood, while others were more like globular clusters. Thus was born Population I and Population II, the epitome of astronomical terminology.

More generally, one can imagine defining lots of populations by tracing groups of stars with a common origin in space and time to the event in which they formed. From this perspective, the Milky Way is the composite of all the star forming events that built it up. Each group has its own age, composition, and orbital properties, and it would be good to have a map that is more detailed than “Pop I” and “Pop II.” Many projects are working to map out these complex details, including ESA’s Gaia satellite, which is producing many spectacular and fundamental results, like the orbit and acceleration of the sun within the Milky Way.

A simple stellar population is a group of stars that all share the same composition and age: they were born of the same material at the same time. Even such a simple stellar population can be rather complicated, as stars form with a distribution of masses (the IMF, for Initial Mass Function) from tiny to massive. The lowest mass stars are those that just barely cross the threshold for igniting hydrogen fusion in their core, which occurs at about 7% of the mass of the sun. Still lower mass objects are called brown dwarfs, and were once considered a candidate for dark matter. Though they don’t shine from fusion like stars, brown dwarfs do glow with the residual heat of their formation through gravitational contraction, and we can now see that there are nowhere near enough of them to be the dark matter. At the opposite end of the mass spectrum, stars many tens of times the mass of the sun are known, with occasional specimens reaching upwards of 100 solar masses. These massive stars burn bright and exhaust their fuel quickly, exploding as supernovae after a few million years – a mere blink of the cosmic eye. By contrast, the lowest mass stars are so faint that they take practically forever to burn through their fuel, and are expected to continue to shine (albeit feebly) for many tens of Hubble times into the future. There is a strong and continuous relation between stellar mass and lifetime: the sun is expected to persist as-is for about 10 billion years (it is just shy of halfway through its “main sequence” lifetime). After a mundane life fusing hydrogen and helium as a main sequence star, the sun will swell into a red giant, becoming brighter and larger in radius (but not mass). This period is much shorter-lived, as are the complex sequence of events that follow it, ultimately leaving behind the naked core as an earth-sized but roughly half solar mass white dwarf remnant.

Matters become more complicated when we consider galaxies composed of multiple generations and different compositions. Nevertheless, we understand well enough the evolution of individual stars – a triumph of twentieth century astronomy – to consider the complex stellar populations of external galaxies. A particular interest of mine are the stellar populations of low surface brightness galaxies. These are late morphological types (often but not always irregular galaxies) that tend to be gas rich and very blue. This requires many young stars, but also implies a low metallicity. This much can be inferred from unresolved observations of galaxies, but the effects of age and composition are often degenerate. The best way to sort this out is to do as Baade did and resolve galaxies into individual stars. This was basically impossible for all but the nearest galaxies before the launch of the Hubble Space Telescope. The resolution of HST allows us to see farther out and deeper into the color-magnitude diagrams of external galaxies.

The low surface brightness galaxy F575-3, as discovered on a POSS-II sky survey plate (left) and as seen by HST (right). Both images are negatives. Only a tiny fraction of the 6.6 degree square POSS-II plate is shown as the image at left covers a mere 1/13 degree on a side. The pink outline shows the still smaller area of sky observed by HST, which visited the object at different roll angles: the differing orientation of the satellite causes the slight twist in the rectangular shape that is imaged. HST resolves individual stars, allowing construction of a color-magnitude diagram. It also resolves background galaxies, which are the majority of the extended objects in this image. Some even shine right through the foreground LSB galaxy!

Collaborator Jim Schombert has long been a leader in studying low surface brightness galaxies, discovering many examples of the class, and leading their study with HST among many stellar contributions. He is one of the unsung heroes without whom the field would be nowhere near where it is today. This post discusses a big puzzle he has identified in the stellar populations of low surface brightness galaxies: the case of the stars with inexplicable[?] IR excesses. Perhaps he has also solved this puzzle, but first we have to understand what is normal and what is weird in a galaxy’s stellar population.

When we resolve a galaxy into stars in more than one filter, the first thing we do is plot a color-magnitude diagram (CMD). The CMD quantifies how bright a star is, and what its color is – a proxy for its surface temperature. Hot stars are blue; cooler ones are red. The CMD is the primary tool by which the evolution of stars was unraveled. Normal features of the CMD include the main sequence (where stars spend the majority of their lives) and the red giant branch (prominent since giant stars are bright if rare). This is what Baade recognized in Populations I and II – stars with CMDs like those near the sun (lots of main sequence stars and some red giants) and those like globular clusters (mostly red giants at bright magnitudes and fainter main sequence stars).

In actively star forming galaxies like F415-3 below, there are plenty of young, massive, bright stars. These evolve rapidly, traipsing across the CMD from blue to red and back to blue and then red again. We can use what we know about stellar evolution to deduce the star formation history of a galaxy – how many stars formed as a function of time. This works quite well for short time periods as massive stars evolve fast and are easy to see, but it becomes increasingly hard for older stars. A galaxy boasts about its age when it is young but becomes less forthcoming as it gets older.

Color-magnitude diagram (CMD) of the low surface brightness galaxy F415-3 observed by HST (Schombert & McGaugh 2015). Each point is one star. The x-axis is color, with bluer stars to the left and redder stars to the right. The y-axis is magnitude: brighter stars are higher; fainter stars are lower. There are many, many stars fainter than those detected here; these observations only resolve the brightest stars that populate the top of the CMD. The lines demarcate the CMD into regions dominated by stars in various evolutionary phases. Counting stars in each box lets us trace out the recent star formation history, which is found to vary stochastically over the past few tens of millions of years while remaining roughly constant when averaged over the age of the universe (13+ billion years).

Most late type, irregular galaxies have been perking along, forming stars at a modest but fairly steady rate for most of the history of the universe. That’s a very broad-brush statement; there are many puzzling details in the details. F415-3 seems to be deficient in AGB stars. These are asymptotic giants, the phase of evolution after the phase after the first-ascent red giant branch. This may be challenging the limits of our understanding of the modeling of stellar evolution. The basics are well-understood, but stars are giant, complicated, multifaceted beasts: just as understanding that terrestrial planets are basically metallic cores surrounded by mantles of rocky minerals falls short of describing the Earth, so too does a basic understanding of stellar evolution fall short of explaining every detail of every star. That’s what I love about astronomy: there is always something new to learn.

Below is the CMD of F575-3, now in the near infrared filters available on HST rather than the optical filters above. There is not such a rich recent star formation history in this case; indeed, this galaxy has been abnormally quiescent for its class. There are some young stars above the tip of the red giant branch (the horizontal blue line), but no HII regions of ionized gas that point up the hottest, youngest stars (typically < 10 Myr old). Mostly we see a red giant branch (the region dark with points below the line) and some main sequence stars (the cloud of points to the left of the red giant branch). These merge into a large blob at faint magnitudes as the uncertainties smear everything together at the limits of the observation.

Color-magnitude diagram of the stars in F575-3 observed by HST (left) and the surrounding field (right). The typical size of the error bars is shown in the right panel; this causes the data to smear into a blob at fainter magnitudes. One can nevertheless recognize some of the main features, as noted: the main sequence of young stars, the red giant branch below the horizontal line, and a region of rapidly evolving stars above the line (mostly asymptotic giants with some helium burning stars and a few red supergiants). There are also a number of stars to the right of the giant branch, in a region of the CMD that is not explained by models of stellar evolution. There shouldn’t be any stars here, but there are more than can be explained by background contamination. What are they?

One cool thing about F575-3 is that it has the bluest red giants known. All red giants are red, but just how red depends sensitively on their metallicity – the fraction of their composition that isn’t hydrogen or helium. As stars evolve, they synthesize heavy elements that are incorporated into subsequent generations of stars. After a while, you have a comparatively metal-rich composition like that of the sun – which is still not much: the mass of the elements in the sun that are not hydrogen or helium is less than 2% of the total. I know that sounds like a small fraction – it is a small fraction – but it is still rather a lot by the standards of the universe in which we live, which started as three parts hydrogen and one part helium, and nothing heavier than lithium. Stars have had to work hard for generation upon generation to make everything else in the periodic table from carbon on up. Galaxies smaller than the Milky Way haven’t got as far along in this process, so dwarf galaxies are typically low metallicity – often much less than 1% by mass.

F575-3 is especially low metallicity. Or so it appears from the color of its red giant stars. These are the bluest reds currently known. Here are some other dwarfs for comparison, organized in order of increasing metallicity. The right edge of the red giant branch in F575-3 is clearly to the left of everything else.

Color-magnitude diagrams of some of the dwarf galaxies that have been observed by HST. Colored lines illustrate the sequence expected for red giants of different metallicities. These are all well below the solar composition, as measured by the logarithmic ratio of the iron abundance relative to hydrogen relative to that in the sun: solar [Fe/H] = 0; [Fe/H] = -1 is one tenth of the solar metal abundance. The lines illustrate the locations of giant branches with [Fe/H] = -2.3 (blue), -1.5 (green) and -0.7 (red). That’s 0.5%, 3%, and 20% of solar, respectively. Heavy elements make up less than 0.4% of the mass of the stars in these galaxies.

But that’s not what I wrote to tell you about. I already knew LSB galaxies were low metallicity; that’s what I did part of my thesis on. That was based on the gas phase abundances, but it makes sense that the stars would share this property – they form out of the interstellar gas, after all. Somebody has to be the bluest of them all. That’s remarkable, but not surprising.

What is surprising is that F575-3 has an excess of stars with an IR-excess – their colors are too red in the infrared part of the spectrum. These are the stars to the right of the red giant branch. We found it basically impossible to populate this portion of the CMD without completely overdoing it. Plausible stellar evolution tracks don’t go there. Nature has no menu option for a sprinkling of high metallicity giant stars but hold the metals everywhere else: once you make those metals, there are ample numbers of high metallicity stars. So what the heck are these things with a near-IR excess?

The CMD of F575-3 in near-IR (left) and optical colors (right). Main sequence stars are blue, rapidly evolving phases like asymptotic giants are red, and most of the black points are red giant stars. There is a population of mystery stars colored purple. These have a near-IR excess: very red colors in the infrared, but normal colors in the optical.

My first thought was that they were bogus. There are always goofy things in astronomical data; outliers are often defects of some sort – in the detector, or the result of cosmic ray strikes. So initially they were easy to ignore. However, this kept nagging at us; it seemed like too much to just dismiss. There are some things like this in the background, but not enough to explain how many we see in the body of the diagram. This argued against things not associated with the galaxy itself, like background galaxies with redshifted colors. When we plotted the distribution of near-IR excess objects, they were clearly associated with the galaxy.

The distribution of sources with a near-IR excess (red) compared to objects of similar apparent magnitude. They’re in the same place as the galaxy that the eye sees in the raw image. Whatever they are, they’re clearly part of F575-3.

The colors make no sense for stars. They aren’t the occasional high metallicity red giant. So our next thought was extinction by interstellar dust. This has the net effect of making things look redder. But Jim did the hard work of matching up individual stars in both the optical and near-IR filters. The optical colors are normal. The population that stands out in the near-IR CMD mixes in evenly with the rest of the stars in the optical CMD. That’s the opposite of what dust does. Dust affects the optical colors more strongly. Here the optical colors are normal, but the near-IR colors are too red – hence an IR-excess.

There, I was stumped. We had convinced ourselves that we couldn’t just dismiss the IR-excess population as artifacts. They had the right spatial distribution to be part of the galaxy. They had the right magnitudes to be stars in the galaxy. But that had really weird IR colors that were unexplained by any plausible track of stellar evolution.

Important detail: stellar evolution models track what happens in the star, up to its surface, but not in the environment beyond. Jim thought about it, and came back to me with an idea outside my purview. He remembered a conversation he had had long ago with Karl Rakos while observing high redshift clusters with custom-tailored filters. Rakos had previously worked on Ap and Be stars – peculiar stars. I had heard of these things, but they’re rare and don’t contribute significantly to the integrated light of the stellar population in a galaxy like the Milky Way. They seemed like an oddity of little consequence in a big universe.

Be stars – that’s “B” then “e” for B-type stars (the second hottest spectral classification) with emission lines (hence the e). Stars mostly just have absorption lines; emission lines make them peculiar. But Jim learned from his conversations with Rakos that these stars also frequently had IR-excesses. Some digging into the literature, and sure enough, these types of stars have the right magnitudes and colors to explain the strange population we can’t otherwise understand.

It is still weird. There are a lot of them. Not a lot in an absolute sense, but a lot more than we’d expect from their frequency in the Milky Way. But now that we know to look for them, you can see a similar population in the some other dwarfs. Maybe they become more frequent in lower metallicity galaxies. The emission lines and the IR excess come from a disk of hot gas around the star; maybe such disks are more likely to form when there are fewer metals. This makes at least a tiny amount of sense, as B stars have a lot of energy to emit and angular momentum to transport. The mechanisms by which that can happen multiply when there are metals to make dust grains that can absorb and reprocess the abundance of UV photons. In their absence, when the metallicity is low, nature has to find another way. So maybe – maybe – Be stars are more common in lower metallicity environments because the dearth of dust encourages the formation of gas disks. That’s entirely speculative (a fun but dangerous aspect of astronomy), so maybe not.

I don’t know if ultimately Be stars are the correct interpretation. It’s the best we’ve come up with. I really don’t know whether metallicity and dust play the role I just speculatively described. But it is a new and unexpected thing – and that’s the cool thing about the never-ending discovery space of astronomy. Even when you know what to expect, the universe can still surprise you – if you pay attention to the data.