LZ: another non-detection

LZ: another non-detection

Just as I was leaving for a week’s vacation, the dark matter search experiment LZ reported its first results. Now that I’m back, I see that I didn’t miss anything. Here is their figure of merit:

The latest experimental limits on WIMP dark matter from LZ (arXiv:2207.03764). The parameter space above the line is excluded. Note the scale on the y-axis bearing in mind that the original expectation was for a cross section around 10-39 cm2, well above the top edge of this graph.

LZ is a merger of two previous experiments compelled to grow still bigger in the never-ending search for dark matter. It contains “seven active tonnes of liquid xenon,” which is an absurd amount, being a substantial fraction of the entire terrestrial supply. It all has to be super-cooled to near absolute zero and filtered of all contaminants that might include naturally radioactive isotopes that might mimic the sought-after signal of dark matter scattering off of xenon nuclei. It is a technological tour de force.

The technology is really fantastic. The experimentalists have accomplished amazing things in building these detectors. They have accomplished the target sensitivity, and then some. If WIMPs existed, they should have found them by now.

WIMPs have not been discovered. As the experiments have improved, the theorists have been obliged to repeatedly move the goalposts. The original (1980s) expectation for the interaction cross-section was 10-39 cm2. That was quickly excluded, but more careful (1990s) calculation suggested perhaps more like 10-42 cm2. This was also excluded experimentally. By the late 2000s, the “prediction” had migrated to 10-46 cm2. This has also now been excluded, so the goalposts have been moved to 10-48 cm2. This migration has been driven entirely by the data; there is nothing miraculous about a WIMP with this cross section.

As remarkable a technological accomplishment as experiments like LZ are, they are becoming the definition of insanity: repeating the same action but expecting a different result.

For comparison, consider the LIGO detection of gravitational waves. A large team of scientists worked unspeakably hard to achieve the detection of a tiny effect. It took 40 years of failure before success was obtained. Until that point, it seemed much the same: repeating the same action but expecting a different result.

Except it wasn’t, because there was a clear expectation for the sensitivity that was required to detect gravitational waves. Once that sensitivity was achieved, they were detected. It wasn’t that simple of course, but close enough for our purposes: it took a long time to get where they were going, but they achieved success once they got there. Having a clear prediction is essential.

In the case of WIMP searches, there was also a clear prediction. The required sensitivity was achieved – long ago. Nothing was found, so the goalposts were moved – by a lot. Then the new required sensitivity was achieved, still without detection. Repeatedly.

It always makes sense to look harder for something you expect if at first you don’t succeed. But at some point, you have to give up: you ain’t gonna find it. This is disappointing, but we’ve all experienced this kind of disappointment at some point in our lives. The tricky part is deciding when to give up.

In science, the point to give up is when your hypothesis is falsified. The original WIMP hypothesis was falsified a long time ago. We keep it on life support with modifications, often obfuscating (to our students and to ourselves) that the WIMPs we’re talking about today are no longer the WIMPs we originally conceived.

I sometimes like to imagine the thought experiment of sending some of the more zealous WIMP advocates back in time to talk to their younger selves. What would they say? How would they respond to themselves? These are not people who like to be contradicted by anyone, even themselves, so I suspect it would go something like

Old scientist: “Hey, kid – I’m future you. This experiment you’re about to spend your life working on won’t detect what you’re looking for.”

Young scientist: “Uh huh. You say you’re me from the future, Mr. Credibility? Tell me: at what point do I go senile, you doddering old fool?”

Old scientist: “You don’t. It just won’t work out the way you think. On top of dark matter, there’s also dark energy…”

Young scientist: “What the heck is dark energy, you drooling crackpot?”

Old scientist: “The cosmological constant.”

Young scientist: “The cosmological constant! You can’t expect people to take you seriously talking about that rubbish. GTFO.”

That’s the polite version that doesn’t end in fisticuffs. It’s easy to imagine this conversation going south much faster. I know that if 1993 me had received a visit from 1998 me telling me that in five years I would have come to doubt WIMPs, and also would have demonstrated that the answer to the missing mass problem might not be dark matter at all, I… would not have taken it well.

That’s why predictions are important in science. They tell us when to change our mind. When to stop what we’re doing because it’s not working. When to admit that we were wrong, and maybe consider something else. Maybe that something else won’t prove correct. Maybe the next ten something elses won’t. But we’ll never find out if we won’t let go of the first wrong thing.

Cosmic whack-a-mole

Cosmic whack-a-mole

The fine-tuning problem encountered by dark matter models that I talked about last time is generic. The knee-jerk reaction of most workers seems to be “let’s build a more sophisticated model.” That’s reasonable – if there is any hope of recovery. The attitude is that dark matter has to be right so something has to work out. This fails to even contemplate the existential challenge that the fine-tuning problem imposes.

Perhaps I am wrong to be pessimistic, but my concern is well informed by years upon years trying to avoid this conclusion. Most of the claims I have seen to the contrary are just specialized versions of the generic models I had already built: they contain the same failings, but these go unrecognized because the presumption is that something has to work out, so people are often quick to declare “close enough!”

In my experience, fixing one thing in a model often breaks something else. It becomes a game of cosmic whack-a-mole. If you succeed in suppressing the scatter in one relation, it pops out somewhere else. A model that seems like it passes the test you built it to pass flunks as soon as you confront it with another test.

Let’s consider a few examples.


Squeezing the toothpaste tube

Our efforts to evade one fine-tuning problem often lead to another. This has been my general experience in many efforts to construct viable dark matter models. It is like squeezing a tube of toothpaste: every time we smooth out the problems in one part of the tube, we simply squeeze them into a different part. There are many published claims to solve this problem or that, but they frequently fail to acknowledge (or notice) that the purported solution to one problem creates another.

One example is provided by Courteau and Rix (1999). They invoke dark matter domination to explain the lack of residuals in the Tully-Fisher relation. In this limit, Mb/R ​≪ ​MDM/R and the baryons leave no mark on the rotation curve. This can reconcile the model with the Tully-Fisher relation, but it makes a strong prediction. It is not just the flat rotation speed that is the same for galaxies of the same mass, but the entirety of the rotation curve, V(R) at all radii. The stars are just convenient tracers of the dark matter halo in this limit; the dynamics are entirely dominated by the dark matter. The hypothesized solution fixes the problem that is addressed, but creates another problem that is not addressed, in this case the observed variation in rotation curve shape.

The limit of complete dark matter domination is not consistent with the shapes of rotation curves. Galaxies of the same baryonic mass have the same flat outer velocity (Tully-Fisher), but the shapes of their rotation curves vary systematically with surface brightness (de Blok & McGaugh, 1996; Tully and Verheijen, 1997; McGaugh and de Blok, 1998a,b; Swaters et al., 2009, 2012; Lelli et al., 2013, 2016c). High surface brightness galaxies have steeply rising rotation curves while LSB galaxies have slowly rising rotation curves (Fig. 6). This systematic dependence of the inner rotation curve shape on the baryon distribution excludes the SH hypothesis in the limit of dark matter domination: the distribution of the baryons clearly has an impact on the dynamics.

Fig. 6. Rotation curve shapes and surface density. The left panel shows the rotation curves of two galaxies, one HSB (NGC 2403, open circles) and one LSB (UGC 128, filled circles) (de Blok & McGaugh, 1996; Verheijen and de Blok, 1999; Kuzio de Naray et al., 2008). These galaxies have very nearly the same baryonic mass (~ 1010 ​M), and asymptote to approximately the same flat rotation speed (~ 130 ​km ​s−1). Consequently, they are indistinguishable in the Tully-Fisher plane (Fig. 4). However, the inner shapes of the rotation curves are readily distinguishable: the HSB galaxy has a steeply rising rotation curve while the LSB galaxy has a more gradual rise. This is a general phenomenon, as illustrated by the central density relation (right panel: Lelli et al., 2016c) where each point is one galaxy; NGC 2403 and UGC 128 are highlighted as open points. The central dynamical mass surface density (Σdyn) measured by the rate of rise of the rotation curve (Toomre, 1963) correlates with the central surface density of the stars (Σ0) measured by their surface brightness. The line shows 1:1 correspondence: no dark matter is required near the centers of HSB galaxies. The need for dark matter appears below 1000 ​M pc−2 and grows systematically greater to lower surface brightness. This is the origin of the statement that LSB galaxies are dark matter dominated.

A more recent example of this toothpaste tube problem for SH-type models is provided by the EAGLE simulations (Schaye et al., 2015). These are claimed (Ludlow et al., 2017) to explain one aspect of the observations, the radial acceleration relation (McGaugh et al., 2016), but fail to explain another, the central density relation (Lelli et al., 2016c) seen in Fig. 6. This was called the ‘diversity’ problem by Oman et al. (2015), who note that the rotation velocity at a specific, small radius (2 kpc) varies considerably from galaxy to galaxy observationally (Fig. 6), while simulated galaxies show essentially no variation, with only a small amount of scatter. This diversity problem is exactly the same problem that was pointed out before [compare Fig. 5 of Oman et al. (2015) to Fig. 14 of McGaugh and de Blok (1998a)].

There is no single, universally accepted standard galaxy formation model, but a common touchstone is provided by Mo et al. (1998). Their base model has a constant ratio of luminous to dark mass md [their assumption (i)], which provides a reasonable description of the sizes of galaxies as a function of mass or rotation speed (Fig. 7). However, this model predicts the wrong slope (3 rather than 4) for the Tully-Fisher relation. This is easily remedied by making the luminous mass fraction proportional to the rotation speed (md ​∝ ​Vf), which then provides an adequate fit to the Tully-Fisher4 relation. This has the undesirable effect of destroying the consistency of the size-mass relation. We can have one or the other, but not both.

Fig. 7. Galaxy size (as measured by the exponential disk scale length, left) and mass (right) as a function of rotation velocity. The latter is the Baryonic Tully-Fisher relation; the data are the same as in Fig. 4. The solid lines are Mo et al. (1998) models with constant md (their equations 12 and 16). This is in reasonable agreement with the size-speed relation but not the BTFR. The latter may be fit by adopting a variable md ​∝ ​Vf (dashed lines), but this ruins agreement with the size-speed relation. This is typical of dark matter models in which fixing one thing breaks another.

This failure of the Mo et al. (1998) model provides another example of the toothpaste tube problem. By fixing one problem, we create another. The only way forward is to consider more complex models with additional degrees of freedom.

Feedback

It has become conventional to invoke ‘feedback’ to address the various problems that afflict galaxy formation theory (Bullock & Boylan-Kolchin, 2017; De Baerdemaker and Boyd, 2020). It goes by other monikers as well, variously being called ‘gastrophysics’5 for gas phase astrophysics, or simply ‘baryonic physics’ for any process that might intervene between the relatively simple (and calculable) physics of collisionless cold dark matter and messy observational reality (which is entirely illuminated by the baryons). This proliferation of terminology obfuscates the boundaries of the subject and precludes a comprehensive discussion.

Feedback is not a single process, but rather a family of distinct processes. The common feature of different forms of feedback is the deposition of energy from compact sources into the surrounding gas of the interstellar medium. This can, at least in principle, heat gas and drive large-scale winds, either preventing gas from cooling and forming too many stars, or ejecting it from a galaxy outright. This in turn might affect the distribution of dark matter, though the effect is weak: one must move a lot of baryons for their gravity to impact the dark matter distribution.

There are many kinds of feedback, and many devils in the details. Massive, short-lived stars produce copious amounts of ultraviolet radiation that heats and ionizes the surrounding gas and erodes interstellar dust. These stars also produce strong winds through much of their short (~ 10 Myr) lives, and ultimately explode as Type II supernovae. These three mechanisms each act in a distinct way on different time scales. That’s just the feedback associated with massive stars; there are many other mechanisms (e.g., Type Ia supernovae are distinct from Type II supernovae, and Active Galactic Nuclei are a completely different beast entirely). The situation is extremely complicated. While the various forms of stellar feedback are readily apparent on the small scales of stars, it is far from obvious that they have the desired impact on the much larger scales of entire galaxies.

For any one kind of feedback, there can be many substantially different implementations in galaxy formation simulations. Independent numerical codes do not generally return compatible results for identical initial conditions (Scannapieco et al., 2012): there is no consensus on how feedback works. Among the many different computational implementations of feedback, at most one can be correct.

Most galaxy formation codes do not resolve the scale of single stars where stellar feedback occurs. They rely on some empirically calibrated, analytic approximation to model this ‘sub-grid physics’ — which is to say, they don’t simulate feedback at all. Rather, they simulate the accumulation of gas in one resolution element, then follow some prescription for what happens inside that unresolved box. This provides ample opportunity for disputes over the implementation and effects of feedback. For example, feedback is often cited as a way to address the cusp-core problem — or not, depending on the implementation (e.g., Benítez-Llambay et al., 2019; Bose et al., 2019; Di Cintio et al., 2014; Governato et al., 2012; Madau et al., 2014; Read et al., 2019). High resolution simulations (Bland-Hawthorn et al., 2015) indicate that the gas of the interstellar medium is less affected by feedback effects than assumed by typical sub-grid prescriptions: most of the energy is funneled through the lowest density gas — the course of least resistance — and is lost to the intergalactic medium without much impacting the galaxy in which it originates.

From the perspective of the philosophy of science, feedback is an auxiliary hypothesis invoked to patch up theories of galaxy formation. Indeed, since there are many distinct flavors of feedback that are invoked to carry out a variety of different tasks, feedback is really a suite of auxiliary hypotheses. This violates parsimony to an extreme and brutal degree.

This concern for parsimony is not specific to any particular feedback scheme; it is not just a matter of which feedback prescription is best. The entire approach is to invoke as many free parameters as necessary to solve any and all problems that might be encountered. There is little doubt that such models can be constructed to match the data, even data that bear little resemblance to the obvious predictions of the paradigm (McGaugh and de Blok, 1998a; Mo et al., 1998). So the concern is not whether ΛCDM galaxy formation models can explain the data; it is that they can’t not.


One could go on at much greater length about feedback and its impact on galaxy formation. This is pointless. It is a form of magical thinking to expect that the combined effects of numerous complicated feedback effects are going to always add up to looking like MOND in each and every galaxy. It is also the working presumption of an entire field of modern science.

Two Hypotheses

Two Hypotheses

OK, basic review is over. Shit’s gonna get real. Here I give a short recounting of the primary reason I came to doubt the dark matter paradigm. This is entirely conventional – my concern about the viability of dark matter is a contradiction within its own context. It had nothing to do with MOND, which I was blissfully ignorant of when I ran head-long into this problem in 1994. Most of the community chooses to remain blissfully ignorant, which I understand: it’s way more comfortable. It is also why the field has remained mired in the ’90s, with all the apparent progress since then being nothing more than the perpetual reinvention of the same square wheel.


To make a completely generic point that does not depend on the specifics of dark matter halo profiles or the details of baryonic assembly, I discuss two basic hypotheses for the distribution of disk galaxy size at a given mass. These broad categories I label SH (Same Halo) and DD (Density begets Density) following McGaugh and de Blok (1998a). In both cases, galaxies of a given baryonic mass are assumed to reside in dark matter halos of a corresponding total mass. Hence, at a given halo mass, the baryonic mass is the same, and variations in galaxy size follow from one of two basic effects:

  • SH: variations in size follow from variations in the spin of the parent dark matter halo.
  • DD: variations in surface brightness follow from variations in the density of the dark matter halo.

Recall that at a given luminosity, size and surface brightness are not independent, so variation in one corresponds to variation in the other. Consequently, we have two distinct ideas for why galaxies of the same mass vary in size. In SH, the halo may have the same density profile ρ(r), and it is only variations in angular momentum that dictate variations in the disk size. In DD, variations in the surface brightness of the luminous disk are reflections of variations in the density profile ρ(r) of the dark matter halo. In principle, one could have a combination of both effects, but we will keep them separate for this discussion, and note that mixing them defeats the virtues of each without curing their ills.

The SH hypothesis traces back to at least Fall and Efstathiou (1980). The notion is simple: variations in the size of disks correspond to variations in the angular momentum of their host dark matter halos. The mass destined to become a dark matter halo initially expands with the rest of the universe, reaching some maximum radius before collapsing to form a gravitationally bound object. At the point of maximum expansion, the nascent dark matter halos torque one another, inducing a small but non-zero net spin in each, quantified by the dimensionless spin parameter λ (Peebles, 1969). One then imagines that as a disk forms within a dark matter halo, it collapses until it is centrifugally supported: λ → 1 from some initially small value (typically λ ​≈ ​0.05, Barnes & Efstathiou, 1987, with some modest distribution about this median value). The spin parameter thus determines the collapse factor and the extent of the disk: low spin halos harbor compact, high surface brightness disks while high spin halos produce extended, low surface brightness disks.

The distribution of primordial spins is fairly narrow, and does not correlate with environment (Barnes & Efstathiou, 1987). The narrow distribution was invoked as an explanation for Freeman’s Law: the small variation in spins from halo to halo resulted in a narrow distribution of disk central surface brightness (van der Kruit, 1987). This association, while apparently natural, proved to be incorrect: when one goes through the mathematics to transform spin into scale length, even a narrow distribution of initial spins predicts a broad distribution in surface brightness (Dalcanton, Spergel, & Summers, 1997; McGaugh and de Blok, 1998a). Indeed, it predicts too broad a distribution: to prevent the formation of galaxies much higher in surface brightness than observed, one must invoke a stability criterion (Dalcanton, Spergel, & Summers, 1997; McGaugh and de Blok, 1998a) that precludes the existence of very high surface brightness disks. While it is physically quite reasonable that such a criterion should exist (Ostriker and Peebles, 1973), the observed surface density threshold does not emerge naturally, and must be inserted by hand. It is an auxiliary hypothesis invoked to preserve SH. Once done, size variations and the trend of average size with mass work out in reasonable quantitative detail (e.g., Mo et al., 1998).

Angular momentum conservation must hold for an isolated galaxy, but the assumption made in SH is stronger: baryons conserve their share of the angular momentum independently of the dark matter. It is considered a virtue that this simple assumption leads to disk sizes that are about right. However, this assumption is not well justified. Baryons and dark matter are free to exchange angular momentum with each other, and are seen to do so in simulations that track both components (e.g., Book et al., 2011; Combes, 2013; Klypin et al., 2002). There is no guarantee that this exchange is equitable, and in general it is not: as baryons collapse to form a small galaxy within a large dark matter halo, they tend to lose angular momentum to the dark matter. This is a one-way street that runs in the wrong direction, with the final destination uncomfortably invisible with most of the angular momentum sequestered in the unobservable dark matter. Worse still, if we impose rigorous angular momentum conservation among the baryons, the result is a disk with a completely unrealistic surface density profile (van den Bosch, 2001a). It then becomes necessary to pick and choose which baryons manage to assemble into the disk and which are expelled or otherwise excluded, thereby solving one problem by creating another.

Early work on LSB disk galaxies led to a rather different picture. Compared to the previously known population of HSB galaxies around which our theories had been built, the LSB galaxy population has a younger mean stellar age (de Blok & van der Hulst, 1998; McGaugh and Bothun, 1994), a lower content of heavy elements (McGaugh, 1994), and a systematically higher gas fraction (McGaugh and de Blok, 1997; Schombert et al., 1997). These properties suggested that LSB galaxies evolve more gradually than their higher surface brightness brethren: they convert their gas into stars over a much longer timescale (McGaugh et al., 2017). The obvious culprit for this difference is surface density: lower surface brightness galaxies have less gravity, hence less ability to gather their diffuse interstellar medium into dense clumps that could form stars (Gerritsen and de Blok, 1999; Mihos et al., 1999). It seemed reasonable to ascribe the low surface density of the baryons to a correspondingly low density of their parent dark matter halos.

One way to think about a region in the early universe that will eventually collapse to form a galaxy is as a so-called top-hat over-density. The mass density Ωm → 1 ​at early times, irrespective of its current value, so a spherical region (the top-hat) that is somewhat over-dense early on may locally exceed the critical density. We may then consider this finite region as its own little closed universe, and follow its evolution with the Friedmann equations with Ω ​> ​1. The top-hat will initially expand along with the rest of the universe, but will eventually reach a maximum radius and recollapse. When that happens depends on the density. The greater the over-density, the sooner the top-hat will recollapse. Conversely, a lesser over-density will take longer to reach maximum expansion before recollapsing.

Everything about LSB galaxies suggested that they were lower density, late-forming systems. It therefore seemed quite natural to imagine a distribution of over-densities and corresponding collapse times for top-hats of similar mass, and to associate LSB galaxy with the lesser over-densities (Dekel and Silk, 1986; McGaugh, 1992). More recently, some essential aspects of this idea have been revived under the monicker of “assembly bias” (e.g. Zehavi et al., 2018).

The work that informed the DD hypothesis was based largely on photometric and spectroscopic observations of LSB galaxies: their size and surface brightness, color, chemical abundance, and gas content. DD made two obvious predictions that had not yet been tested at that juncture. First, late-forming halos should reside preferentially in low density environments. This is a generic consequence of Gaussian initial conditions: big peaks defined on small (e.g., galaxy) scales are more likely to be found in big peaks defined on large (e.g., cluster) scales, and vice-versa. Second, the density of the dark matter halo of an LSB galaxy should be lower than that of an equal mass halo containing and HSB galaxy. This predicts a clear signature in their rotation speeds, which should be lower for lower density.

The prediction for the spatial distribution of LSB galaxies was tested by Bothun et al. (1993) and Mo et al. (1994). The test showed the expected effect: LSB galaxies were less strongly clustered than HSB galaxies. They are clustered: both galaxy populations follow the same large scale structure, but HSB galaxies adhere more strongly to it. In terms of the correlation function, the LSB sample available at the time had about half the amplitude r0 as comparison HSB samples (Mo et al., 1994). The effect was even more pronounced on the smallest scales (<2 Mpc: Bothun et al., 1993), leading Mo et al. (1994) to construct a model that successfully explained both small and large scale aspects of the spatial distribution of LSB galaxies simply by associating them with dark matter halos that lacked close interactions with other halos. This was strong corroboration of the DD hypothesis.

One way to test the prediction of DD that LSB galaxies should rotate more slowly than HSB galaxies was to use the Tully-Fisher relation (Tully and Fisher, 1977) as a point of reference. Originally identified as an empirical relation between optical luminosity and the observed line-width of single-dish 21 ​cm observations, more fundamentally it turns out to be a relation between the baryonic mass of a galaxy (stars plus gas) and its flat rotation speed the Baryonic Tully-Fisher relation (BTFR: McGaugh et al., 2000). This relation is a simple power law of the form

Mb = AVf4 (equation 1)

with A ​≈ ​50 ​M km−4 s4 (McGaugh, 2005).

Aaronson et al. (1979) provided a straightforward interpretation for a relation of this form. A test particle orbiting a mass M at a distance R will have a circular speed V

V2 = GM/R (equation 2)

where G is Newton’s constant. If we square this, a relation like the Tully-Fisher relation follows:

V4 = (GM/R)2 &propto; MΣ (equation 3)

where we have introduced the surface mass density Σ ​= ​M/R2. The Tully-Fisher relation M ​∝ ​V4 is recovered if Σ is constant, exactly as expected from Freeman’s Law (Freeman, 1970).

LSB galaxies, by definition, have central surface brightnesses (and corresponding stellar surface densities Σ0) that are less than the Freeman value. Consequently, DD predicts, through equation (3), that LSB galaxies should shift systematically off the Tully-Fisher relation: lower Σ means lower velocity. The predicted effect is not subtle2 (Fig. 4). For the range of surface brightness that had become available, the predicted shift should have stood out like the proverbial sore thumb. It did not (Hoffman et al., 1996; McGaugh and de Blok, 1998a; Sprayberry et al., 1995; Zwaan et al., 1995). This had an immediate impact on galaxy formation theory: compare Dalcanton et al. (1995, who predict a shift in Tully-Fisher with surface brightness) with Dalcanton et al. (1997b, who do not).

Fig. 4. The Baryonic Tully-Fisher relation and residuals. The top panel shows the flat rotation velocity of galaxies in the SPARC database (Lelli et al., 2016a) as a function of the baryonic mass (stars plus gas). The sample is restricted to those objects for which both quantities are measured to better than 20% accuracy. The bottom panel shows velocity residuals around the solid line in the top panel as a function of the central surface density of the stellar disks. Variations in the stellar surface density predict variations in velocity along the dashed line. These would translate to shifts illustrated by the dotted lines in the top panel, with each dotted line representing a shift of a factor of ten in surface density. The predicted dependence on surface density is not observed (Courteau & Rix, 1999; McGaugh and de Blok, 1998a; Sprayberry et al., 1995; Zwaan et al., 1995).

Instead of the systematic variation of velocity with surface brightness expected at fixed mass, there was none. Indeed, there is no hint of a second parameter dependence. The relation is incredibly tight by the standards of extragalactic astronomy (Lelli et al., 2016b): baryonic mass and the flat rotation speed are practically interchangeable.

The above derivation is overly simplistic. The radius at which we should make a measurement is ill-defined, and the surface density is dynamical: it includes both stars and dark matter. Moreover, galaxies are not spherical cows: one needs to solve the Poisson equation for the observed disk geometry of LTGs, and account for the varying radial contributions of luminous and dark matter. While this can be made to sound intimidating, the numerical computations are straightforward and rigorous (e.g., Begeman et al., 1991; Casertano & Shostak, 1980; Lelli et al., 2016a). It still boils down to the same sort of relation (modulo geometrical factors of order unity), but with two mass distributions: one for the baryons Mb(R), and one for the dark matter MDM(R). Though the dark matter is more massive, it is also more extended. Consequently, both components can contribute non-negligibly to the rotation over the observed range of radii:

V2(R) = GM/R = G(Mb/R + MDM/R), (equation 4)

(4)where for clarity we have omitted* geometrical factors. The only absolute requirement is that the baryonic contribution should begin to decline once the majority of baryonic mass is encompassed. It is when rotation curves persist in remaining flat past this point that we infer the need for dark matter.

A recurrent problem in testing galaxy formation theories is that they seldom make ironclad predictions; I attempt a brief summary in Table 1. SH represents a broad class of theories with many variants. By construction, the dark matter halos of galaxies of similar stellar mass are similar. If we associate the flat rotation velocity with halo mass, then galaxies of the same mass have the same circular velocity, and the problem posed by Tully-Fisher is automatically satisfied.

Table 1. Predictions of DD and SH for LSB galaxies.

ObservationDDSH
Evolutionary rate++
Size distribution++
Clustering+X
Tully-Fisher relationX?
Central density relation+X

While it is common to associate the flat rotation speed with the dark matter halo, this is a half-truth: the observed velocity is a combination of baryonic and dark components (eq. (4)). It is thus a rather curious coincidence that rotation curves are as flat as they are: the Keplerian decline of the baryonic contribution must be precisely balanced by an increasing contribution from the dark matter halo. This fine-tuning problem was dubbed the “disk-halo conspiracy” (Bahcall & Casertano, 1985; van Albada & Sancisi, 1986). The solution offered for the disk-halo conspiracy was that the formation of the baryonic disk has an effect on the distribution of the dark matter. As the disk settles, the dark matter halo respond through a process commonly referred to as adiabatic compression that brings the peak velocities of disk and dark components into alignment (Blumenthal et al., 1986). Some rearrangement of the dark matter halo in response to the change of the gravitational potential caused by the settling of the disk is inevitable, so this seemed a plausible explanation.

The observation that LSB galaxies obey the Tully-Fisher relation greatly compounds the fine-tuning (McGaugh and de Blok, 1998a; Zwaan et al., 1995). The amount of adiabatic compression depends on the surface density of stars (Sellwood and McGaugh, 2005b): HSB galaxies experience greater compression than LSB galaxies. This should enhance the predicted shift between the two in Tully-Fisher. Instead, the amplitude of the flat rotation speed remains unperturbed.

The generic failings of dark matter models was discussed at length by McGaugh and de Blok ​(1998a). The same problems have been encountered by others. For example, Fig. 5 shows model galaxies formed in a dark matter halo with identical total mass and density profile but with different spin parameters (van den Bosch, ​2001b). Variations in the assembly and cooling history were also considered, but these make little difference and are not relevant here. The point is that smaller (larger) spin parameters lead to more (less) compact disks that contribute more (less) to the total rotation, exactly as anticipated from variations in the term Mb/R in equation (4). The nominal variation is readily detectable, and stands out prominently in the Tully-Fisher diagram (Fig. 5). This is exactly the same fine-tuning problem that was pointed out by Zwaan et al. ​(1995) and McGaugh and de Blok ​(1998a).

What I describe as a fine-tuning problem is not portrayed as such by van den Bosch (2000) and van den Bosch and Dalcanton (2000), who argued that the data could be readily accommodated in the dark matter picture. The difference is between accommodating the data once known, and predicting it a priori. The dark matter picture is extraordinarily flexible: one is free to distribute the dark matter as needed to fit any data that evinces a non-negative mass discrepancy, even data that are wrong (de Blok & McGaugh, 1998). It is another matter entirely to construct a realistic model a priori; in my experience it is quite easy to construct models with plausible-seeming parameters that bear little resemblance to real galaxies (e.g., the low-spin case in Fig. 5). A similar conundrum is encountered when constructing models that can explain the long tidal tails observed in merging and interacting galaxies: models with realistic rotation curves do not produce realistic tidal tails, and vice-versa (Dubinski et al., 1999). The data occupy a very narrow sliver of the enormous volume of parameter space available to dark matter models, a situation that seems rather contrived.

Fig. 5. Model galaxy rotation curves and the Tully-Fisher relation. Rotation curves (left panel) for model galaxies of the same mass but different spin parameters λ from van den Bosch (2001b, see his Fig. 3). Models with lower spin have more compact stellar disks that contribute more to the rotation curve (V2 ​= ​GM/R; R being smaller for the same M). These models are shown as square points on the Baryonic Tully-Fisher relation (right) along with data for real galaxies (grey circles: Lelli et al., 2016b) and a fit thereto (dashed line). Differences in the cooling history result in modest variation in the baryonic mass at fixed halo mass as reflected in the vertical scatter of the models. This is within the scatter of the data, but variation due to the spin parameter is not.

Both DD and SH predict residuals from Tully-Fisher that are not observed. I consider this to be an unrecoverable failure for DD, which was my hypothesis (McGaugh, 1992), so I worked hard to salvage it. I could not. For SH, Tully-Fisher might be recovered in the limit of dark matter domination, which requires further consideration.


I will save the further consideration for a future post, as that can take infinite words (there are literally thousands of ApJ papers on the subject). The real problem that rotation curve data pose generically for the dark matter interpretation is the fine-tuning required between baryonic and dark matter components – the balancing act explicit in the equations above. This, by itself, constitutes a practical falsification of the dark matter paradigm.

Without going into interesting but ultimately meaningless details (maybe next time), the only way to avoid this conclusion is to choose to be unconcerned with fine-tuning. If you choose to say fine-tuning isn’t a problem, then it isn’t a problem. Worse, many scientists don’t seem to understand that they’ve even made this choice: it is baked into their assumptions. There is no risk of questioning those assumptions if one never stops to think about them, much less worry that there might be something wrong with them.

Much of the field seems to have sunk into a form of scientific nihilism. The attitude I frequently encounter when I raise this issue boils down to “Don’t care! Everything will magically work out! LA LA LA!”


*Strictly speaking, eq. (4) only holds for spherical mass distributions. I make this simplification here to emphasize the fact that both mass and radius matter. This essential scaling persists for any geometry: the argument holds in complete generality.

Common ground

Common ground

In order to agree on an interpretation, we first have to agree on the facts. Even when we agree on the facts, the available set of facts may admit multiple interpretations. This was an obvious and widely accepted truth early in my career*. Since then, the field has decayed into a haphazardly conceived set of unquestionable absolutes that are based on a large but well-curated subset of facts that gratuitously ignores any subset of facts that are inconvenient.

Sadly, we seem to have entered a post-truth period in which facts are drowned out by propaganda. I went into science to get away from people who place faith before facts, and comfortable fictions ahead of uncomfortable truths. Unfortunately, a lot of those people seem to have followed me here. This manifests as people who quote what are essentially pro-dark matter talking points at me like I don’t understand LCDM, when all it really does is reveal that they are posers** who picked up on some common myths about the field without actually reading the relevant journal articles.

Indeed, a recent experience taught me a new psychology term: identity protective cognition. Identity protective cognition is the tendency for people in a group to selectively credit or dismiss evidence in patterns that reflect the beliefs that predominate in their group. When it comes to dark matter, the group happens to be a scientific one, but the psychology is the same: I’ve seen people twist themselves into logical knots to protect their belief in dark matter from being subject to critical examination. They do it without even recognizing that this is what they’re doing. I guess this is a human foible we cannot escape.

I’ve addressed these issues before, but here I’m going to start a series of posts on what I think some of the essential but underappreciated facts are. This is based on a talk that I gave at a conference on the philosophy of science in 2019, back when we had conferences, and published in Studies in History and Philosophy of Science. I paid the exorbitant open access fee (the journal changed its name – and publication policy – during the publication process), so you can read the whole thing all at once if you are eager. I’ve already written it to be accessible, so mostly I’m going to post it here in what I hope are digestible chunks, and may add further commentary if it seems appropriate.

Cosmic context

Cosmology is the science of the origin and evolution of the universe: the biggest of big pictures. The modern picture of the hot big bang is underpinned by three empirical pillars: an expanding universe (Hubble expansion), Big Bang Nucleosynthesis (BBN: the formation of the light elements through nuclear reactions in the early universe), and the relic radiation field (the Cosmic Microwave Background: CMB) (Harrison, 2000; Peebles, 1993). The discussion here will take this framework for granted.

The three empirical pillars fit beautifully with General Relativity (GR). Making the simplifying assumptions of homogeneity and isotropy, Einstein’s equations can be applied to treat the entire universe as a dynamical entity. As such, it is compelled either to expand or contract. Running the observed expansion backwards in time, one necessarily comes to a hot, dense, early phase. This naturally explains the CMB, which marks the transition from an opaque plasma to a transparent gas (Sunyaev and Zeldovich, 1980; Weiss, 1980). The abundances of the light elements can be explained in detail with BBN provided the universe expands in the first few minutes as predicted by GR when radiation dominates the mass-energy budget of the universe (Boesgaard & Steigman, 1985).

The marvelous consistency of these early universe results with the expectations of GR builds confidence that the hot big bang is the correct general picture for cosmology. It also builds overconfidence that GR is completely sufficient to describe the universe. Maintaining consistency with modern cosmological data is only possible with the addition of two auxiliary hypotheses: dark matter and dark energy. These invisible entities are an absolute requirement of the current version of the most-favored cosmological model, ΛCDM. The very name of this model is born of these dark materials: Λ is Einstein’s cosmological constant, of which ‘dark energy’ is a generalization, and CDM is cold dark matter.

Dark energy does not enter much into the subject of galaxy formation. It mainly helps to set the background cosmology in which galaxies form, and plays some role in the timing of structure formation. This discussion will not delve into such details, and I note only that it was surprising and profoundly disturbing that we had to reintroduce (e.g., Efstathiou et al., 1990; Ostriker and Steinhardt, 1995; Perlmutter et al., 1999; Riess et al., 1998; Yoshii and Peterson, 1995) Einstein’s so-called ‘greatest blunder.’

Dark matter, on the other hand, plays an intimate and essential role in galaxy formation. The term ‘dark matter’ is dangerously crude, as it can reasonably be used to mean anything that is not seen. In the cosmic context, there are at least two forms of unseen mass: normal matter that happens not to glow in a way that is easily seen — not all ordinary material need be associated with visible stars — and non-baryonic cold dark matter. It is the latter form of unseen mass that is thought to dominate the mass budget of the universe and play a critical role in galaxy formation.

Cold Dark Matter

Cold dark matter is some form of slow moving, non-relativistic (‘cold’) particulate mass that is not composed of normal matter (baryons). Baryons are the family of particles that include protons and neutrons. As such, they compose the bulk of the mass of normal matter, and it has become conventional to use this term to distinguish between normal, baryonic matter and the non-baryonic dark matter.

The distinction between baryonic and non-baryonic dark matter is no small thing. Non-baryonic dark matter must be a new particle that resides in a new ‘dark sector’ that is completely distinct from the usual stable of elementary particles. We do not just need some new particle, we need one (or many) that reside in some sector beyond the framework of the stubbornly successful Standard Model of particle physics. Whatever the solution to the mass discrepancy problem turns out to be, it requires new physics.

The cosmic dark matter must be non-baryonic for two basic reasons. First, the mass density of the universe measured gravitationally (Ωm ​≈ ​0.3, e.g., Faber and Gallagher, 1979; Davis et al., 1980, 1992) clearly exceeds the mass density in baryons as constrained by BBN (Ωb ​≈ ​0.05, e.g., Walker et al., 1991). There is something gravitating that is not ordinary matter: Ωm ​> ​Ωb.

The second reason follows from the absence of large fluctuations in the CMB (Peebles and Yu, 1970; Silk, 1968; Sunyaev and Zeldovich, 1980). The CMB is extraordinarily uniform in temperature across the sky, varying by only ~ 1 part in 105 (Smoot et al., 1992). These small temperature variations correspond to variations in density. Gravity is an attractive force; it will make the rich grow richer. Small density excesses will tend to attract more mass, making them larger, attracting more mass, and leading to the formation of large scale structures, including galaxies. But gravity is also a weak force: this process takes a long time. In the long but finite age of the universe, gravity plus known baryonic matter does not suffice to go from the initially smooth, highly uniform state of the early universe to the highly clumpy, structured state of the local universe (Peebles, 1993). The solution is to boost the process with an additional component of mass — the cold dark matter — that gravitates without interacting with the photons, thus getting a head start on the growth of structure while not aggravating the amplitude of temperature fluctuations in the CMB.

Taken separately, one might argue away the need for dark matter. Taken together, these two distinct arguments convinced nearly everyone, including myself, of the absolute need for non-baryonic dark matter. Consequently, CDM became established as the leading paradigm during the 1980s (Peebles, 1984; Steigman and Turner, 1985). The paradigm has snowballed since that time, the common attitude among cosmologists being that CDM has to exist.

From an astronomical perspective, the CDM could be any slow-moving, massive object that does not interact with photons nor participate in BBN. The range of possibilities is at once limitless yet highly constrained. Neutrons would suffice if they were stable in vacuum, but they are not. Primordial black holes are a logical possibility, but if made of normal matter, they must somehow form in the first second after the Big Bang to not impair BBN. At this juncture, microlensing experiments have excluded most plausible mass ranges that primordial black holes could occupy (Mediavilla et al., 2017). It is easy to invent hypothetical dark matter candidates, but difficult for them to remain viable.

From a particle physics perspective, the favored candidate is a Weakly Interacting Massive Particle (WIMP: Peebles, 1984; Steigman and Turner, 1985). WIMPs are expected to be the lightest stable supersymmetric partner particle that resides in the hypothetical supersymmetric sector (Martin, 1998). The WIMP has been the odds-on favorite for so long that it is often used synonymously with the more generic term ‘dark matter.’ It is the hypothesized particle that launched a thousand experiments. Experimental searches for WIMPs have matured over the past several decades, making extraordinary progress in not detecting dark matter (Aprile et al., 2018). Virtually all of the parameter space in which WIMPs had been predicted to reside (Trotta et al., 2008) is now excluded. Worse, the existence of the supersymmetric sector itself, once seemingly a sure thing, remains entirely hypothetical, and appears at this juncture to be a beautiful idea that nature declined to implement.

In sum, we must have cold dark matter for both galaxies and cosmology, but we have as yet no clue to what it is.


* There is a trope that late in their careers, great scientists come to the opinion that everything worth discovering has been discovered, because they themselves already did everything worth doing. That is not a concern I have – I know we haven’t discovered all there is to discover. Yet I see no prospect for advancing our fundamental understanding simply because there aren’t enough of us pulling in the right direction. Most of the community is busy barking up the wrong tree, and refuses to be distracted from their focus on the invisible squirrel that isn’t there.

** Many of these people are the product of the toxic culture that Simon White warned us about. They wave the sausage of galaxy formation and feedback like a magic wand that excuses all faults while being proudly ignorant of how the sausage was made. Bitch, please. I was there when that sausage was made. I helped make the damn sausage. I know what went into it, and I recognize when it tastes wrong.

Galaxy models in compressed halos

Galaxy models in compressed halos

The last post was basically an introduction to this one, which is about the recent work of Pengfei Li. In order to test a theory, we need to establish its prior. What do we expect?

The prior for fully formed galaxies after 13 billion years of accretion and evolution is not an easy problem. The dark matter halos need to form first, with the baryonic component assembling afterwards. We know from dark matter-only structure formation simulations that the initial condition (A) of the dark matter halo should resemble an NFW halo, and from observations that the end product of baryonic assembly needs to look like a real galaxy (Z). How the universe gets from A to Z is a whole alphabet of complications.

The simplest thing we can do is ignore B-Y and combine a model galaxy with a model dark matter halo. The simplest model for a spiral galaxy is an exponential disk. True to its name, the azimuthally averaged stellar surface density falls off exponentially from a central value over some scale length. This is a tolerable approximation of the stellar disks of spiral galaxies, ignoring their central bulges and their gas content. It is an inadequate yet surprisingly decent starting point for describing gravitationally bound collections of hundreds of billions of stars with just two parameters.

So a basic galaxy model is an exponential disk in an NFW dark matter halo. This is they type of model I discussed in the last post, the kind I was considering two decades ago, and the kind of model still frequently considered. It is an obvious starting point. However, we know that this starting point is not adequate. On the baryonic side, we should model all the major mass components: bulge, disk, and gas. On the halo side, we need to understand how the initial halo depends on its assembly history and how it is modified by the formation of the luminous galaxy within it. The common approach to do all that is to run a giant cosmological simulation and watch what happens. That’s great, provided we know how to model all the essential physics. The action of gravity in an expanding universe we can compute well enough, but we do not enjoy the same ability to calculate the various non-gravitational effects of baryons.

Rather than blindly accept the outcome of simulations that have become so complicated that no one really seems to understand them, it helps to break the problem down into its basic steps. There is a lot going on, but what we’re concerned about here boils down to a tug of war between two competing effects: adiabatic compression tends to concentrate the dark matter, while feedback tends to redistribute it outwards.

Adiabatic compression refers to the response of the dark matter halo to infalling baryons. Though this name stuck, the process isn’t necessarily adiabatic, and the A-word word tends to blind people to a generic and inevitable physical process. As baryons condense into the centers of dark matter halos, the gravitational potential is non-stationary. The distribution of dark matter has to respond to this redistribution of mass: the infall of dissipating baryons drags some dark matter in with them, so we expect dark matter halos to become more centrally concentrated. The most common approach to computing this effect is to assume the process is adiabatic (hence the name). This means a gentle settling that is gradual enough to be time-reversible: you can imagine running the movie backwards, unlike a sudden, violent event like a car crash. It needn’t be rigorously adiabatic, but the compressive response of the halo is inevitable. Indeed, forming a thin, dynamically cold, well-organized rotating disk in a preferred plane – i.e., a spiral galaxy – pretty much requires a period during which the adiabatic assumption is a decent approximation. There is a history of screwing up even this much, but Jerry Sellwood showed that it could be done correctly and that when one does so, it reproduces the results of more expensive numerical simulations. This provides a method to go beyond a simple exponential disk in an NFW halo: we can compute what happens to an NFW halo in response to an observed mass distribution.

After infall and compression, baryons form stars that produce energy in the form of radiation, stellar winds, and the blast waves of supernova explosions. These are sources of energy that complicate what until now has been a straightforward calculation of gravitational dynamics. With sufficient coupling to the surrounding gas, these energy sources might be converted into enough kinetic energy to alter the equilibrium mass distribution and the corresponding gravitational potential. I say might because we don’t really know how this works, and it is a lot more complicated than I’ve made it sound. So let’s not go there, and instead just calculate the part we do know how to calculate. What happens from the inevitable adiabatic compression in the limit of zero feedback?

We have calculated this for a grid of model galaxies that matches the observed distribution or real galaxies. This is important; it often happens that people do not explore a realistic parameter space. Here is a plot of size against stellar mass:

The size of galaxy disks as measured by the exponential scale length as a function of stellar mass. Grey points are real galaxies; red circles are model galaxies with parameters chosen to cover the same parameter space. This, and all plots, from Li et al. (2022).

Note that at a given stellar mass, there is a wide range of sizes. This is an essential aspect of galaxy properties; one has to explain size variations as well as the trend with mass. This obvious point has been frequently forgotten and rediscovered in the literature.

The two parameter plot above only suffices to approximate the stellar disks of spiral and irregular galaxies. Real galaxies have bulges and interstellar gas. We include these in our models so that they cover the same distribution as real galaxies in terms of bulge mass, size, and gas fraction. We then assign a dark matter halo to each model galaxy using an abundance matching relation (the stellar mass tells us the halo mass) and adopt the cosmologically appropriate halo mass-concentration relation. These specify the initial condition of the NFW halo in which each model galaxy is presumed to reside.

At this point, it is worth remarking that there are a variety of abundance matching relations in the literature. Some of these give tragically bad predictions for the kinematics. I won’t delve into this here, but do want to note that in what follows, we have adopted the most favorable abundance matching relation, which turns out to be that of Kravstov et al. (2018). Note that this means that we are already engaged in a kind of fine-tuning by cherry-picking the most favorable relation.

Before considering adiabatic compression, let’s see what happens if we simply add our model galaxies to NFW halos. This is the same exercise we did last time with exponential disks; now we’re including bulges and gas:

Galaxy models in the RAR plane. Models are color coded by their stellar surface density. The dotted line is 1:1 (Newton with no dark matter or other funny business). The black line is the fit to the observed RAR.

This looks pretty good, at least at a first glance. Most of the models fall nearly on top of each other. This isn’t entirely true, as the most massive models overpredict the RAR. This is a generic consequence of the bend in abundance matching relations. This bend is mildest in the Kravtsov relation, which is what makes it “best” here – other relations, like the commonly cited one of Behroozi, predict a lot more high-acceleration models. One sees only a hint of that here.

The scatter is respectably small, mostly solving the problem I initially encountered in the nineties. Despite predicting a narrow relation, the models do have a finite scatter that is a bit more than we observe. This isn’t too tragic, so maybe we can work with it. These models also miss the low acceleration end of the relation by a modest but appreciable amount. This seems more significant, as we found the same thing for pure exponential models: it is hard to make this part of the problem go away.

Including bulges in the models extends them to high accelerations. This would seem to explain a region of the RAR that pure exponential models do not address. Bulges are high surface density, star dominated regions, so they fall on the 1:1 part of the RAR at high accelerations.

And then there are the hooks. These are obvious in the plot above. They occur in low and intermediate mass galaxies that lack a significant bulge component. A pure exponential disk has a peak acceleration at finite radius, but an NFW halo has its peak at zero radius. So if you imagine following a given model line inwards in radius, it goes up in acceleration until it reaches the maximum for the disk along the x-axis. The baryonic component of the acceleration then starts to decline while that due to the NFW halo continues to rise. The model doubles back to lower baryonic acceleration while continuing to higher total acceleration, making the little hook shape. This deviation from the RAR is not commonly observed; indeed, these hooks are the signature of the cusp-core problem in the RAR plane.

Results so far are mixed. With the “right” choice of abundance matching relation, we are well ahead of where we were at the turn of the century, but some real problems remain. We have yet to compute the necessary adiabatic contraction, so hopefully doing that right will result in further improvement. So let’s make a rigorous calculation of the compression that would result from forming a galaxy of the stipulated parameters.

Galaxy models in the RAR plane after compression.

Adiabatic compression makes things worse. There is a tiny improvement at low accelerations, but the most pronounced effects are at small radii where accelerations are large. Compression makes cuspy halos cuspier, making the hooks more pronounced. Worse, the strong concentration of starlight that is a bulge inevitably leads to strong compression. These models don’t approach the 1:1 line at high acceleration, and never can: higher acceleration means higher stellar surface density means greater compression. One cannot start from an NFW halo and ever reach a state of baryon domination; too much dark matter is always in the mix.

It helps to look at the residual diagram. The RAR is a log-log plot over a large dynamic range; this can hide small but significant deviations. For some reason, people who claim to explain the RAR with dark matter models never seem to show these residuals.

As above, with the observed RAR divided out. Model galaxies are mostly above the RAR. The cusp-core problem is exacerbated in disks, and bulges never reach the 1:1 line at high accelerations.

The models built to date don’t have the right shape to explain the RAR, at least when examined closely. Still, I’m pleased: what we’ve done here comes closer than all my many previous efforts, and most of the other efforts that are out there. Still, I wouldn’t claim it as a success. Indeed, the inevitable compressive effects that occur at high surface densities means that we can’t invoke simple offsets to accommodate the data: if a model gets the shape of the RAR right but the normalization wrong, it doesn’t work to simply shift it over.

So, where does that leave us? Up the proverbial creek? Perhaps. We have yet to consider feedback, which is too complicated to delve into here. Instead, while we haven’t engaged in any specific fine-tuning, we have already engaged in some cherry picking. First, we’ve abandoned the natural proportionality between halo and disk mass, replacing it with abundance matching. This is no small step, as it converts a single-valued parameter of our theory to a rolling function of mass. Abundance matching has become familiar enough that people seemed to be lulled into thinking this is natural. There is nothing natural about it. Regardless of how much fancy jargon we use to justify it, it’s still basically a rolling fudge factor – the scientific equivalent of a lipstick smothered pig.

Abundance matching does, at least, use data that are independent of the kinematics to set the relation between stellar and halo mass, and it does go in the right direction for the RAR. This only gets us into the right ballpark, and only if we cherry-pick the particular abundance matching relation that we use. So we’re well down the path of tuning whether we realize it or not. Invoking feedback is simply another step along this path.

Feedback is usually invoked in the kinematic context to convert cusps into cores. That could help with the hooks. This kind of feedback is widely thought to affect low and intermediate mass galaxies, or galaxies of a particular stellar to halo mass ratio. Opinions vary a bit, but it is generally not thought to have such a strong effect on massive galaxies. And yet, we find that we need some (second?) kind of feedback for them, as we need to move bulges back onto the 1:1 line in the RAR plane. That’s perhaps related to the cusp-core problem, but it’s also different. Getting bulges right requires a fine-tuned amount of feedback to exactly cancel out the effects of compression. A third distinct place where the models need some help is at low accelerations. This is far from the region where feedback is thought to have much effect at all.

I could go on, and perhaps will in a future post. Point is, we’ve been tuning our feedback prescriptions to match observed facts about galaxies, not computing how we think it really works. We don’t know how to do the latter, and there is no guarantee that our approximations do justice to reality. So on the one hand, I don’t doubt that with enough tinkering this process can be made to work in a model. On the other hand, I do question whether this is how the universe really works.

A brief history of the Radial Acceleration Relation

A brief history of the Radial Acceleration Relation

In science, all new and startling facts must encounter in sequence the responses

1. It is not true!

2. It is contrary to orthodoxy.

3. We knew it all along.

Louis Agassiz (circa 1861)

This expression exactly depicts the progression of the radial acceleration relation. Some people were ahead of this curve, others are still behind it, but it quite accurately depicts the mass sociology. This is how we react to startling new facts.

For quotation purists, I’m not sure exactly what the original phrasing was. I have paraphrased it to be succinct and have substituted orthodoxy for religion, because even scientists can have orthodoxies: holy cows that must not be slaughtered.

I might even add a precursor stage zero to the list above:

0. It goes unrecognized.

This is to say, that if a new fact is sufficiently startling, we don’t just disbelieve it (stage 1); at first we fail to see it at all. We lack the cognitive framework to even recognize how important it is. An example is provided by the 1941 detection of the microwave background by Andrew McKellar. In retrospect, this is as persuasive as the 1964 detection of Penzias and Wilson to which we usually ascribe the discovery. At the earlier time, there was simply no framework for recognizing what it was that was being detected. It appears to me that P&Z didn’t know what they were looking at either until Peebles explained it to them.

The radial acceleration relation was first posed as the mass discrepancy-acceleration relation. They’re fundamentally the same thing, just plotted in a slightly different way. The mass discrepancy-acceleration relation shows the ratio of total mass to that which is visible. This is basically the ratio of the observed acceleration to that predicted by the observed baryons. This is useful to see how much dark matter is needed, but by construction the axes are not independent, as both measured quantities are used in forming the ratio.

The radial acceleration relation shows independent observations along each axis: observed vs. predicted acceleration. Though measured independently, they are not physically independent, as the baryons contribute some to the total observed acceleration – they do have mass, after all. One can construct a halo acceleration relation by subtracting the baryonic contribution away from the total; in principle the remainders are physically independent. Unfortunately, the axes again become observationally codependent, and the uncertainties blow up, especially in the baryon dominated regime. Which of these depictions is preferable depends a bit on what you’re looking to see; here I just want to note that they are the same information packaged somewhat differently.

To the best of my knowledge, the first mention of the mass discrepancy-acceleration relation in the scientific literature is by Sanders (1990). Its existence is explicit in MOND (Milgrom 1983), but here it is possible to draw a clear line between theory and data. I am only speaking of the empirical relation as it appears in the data, irrespective of anything specific to MOND.

I met Bob Sanders, along with many other talented scientists, in a series of visits to the University of Groningen in the early 1990s. Despite knowing him and having talked to him about rotation curves, I was unaware that he had done this.

Stage 0: It goes unrecognized.

For me, stage one came later in the decade at the culmination of a several years’ campaign to examine the viability of the dark matter paradigm from every available perspective. That’s a long paper, which nevertheless drew considerable praise from many people who actually read it. If you go to the bother of reading it today, you will see the outlines of many issues that are still debated and others that have been forgotten (e.g., the fine-tuning issues).

Around this time (1998), the dynamicists at Rutgers were organizing a meeting on galaxy dynamics, and asked me to be one of the speakers. I couldn’t possibly discuss everything in the paper in the time allotted, so was looking for a way to show the essence of the challenge the data posed. Consequently, I reinvented the wheel, coming up with the mass discrepancy-acceleration relation. Here I show the same data that I had then in the form of the radial acceleration relation:

The Radial Acceleration Relation from the data in McGaugh (1999). Plot credit: Federico Lelli. (There is a time delay in publication: the 1998 meeting’s proceedings appeared in 1999.)

I recognize this version of the plot as having been made by Federico Lelli. I’ve made this plot many times, but this is version I came across first, and it is better than mine in that the opacity of the points illustrates where the data are concentrated. I had been working on low surface brightness galaxies; these have low accelerations, so that part of the plot is well populated.

The data show a clear correlation. By today’s standards, it looks crude. Going on what we had then, it was fantastic. Correlations practically never look this good in extragalactic astronomy, and they certainly don’t happen by accident. Low quality data can hide a correlation – uncertainties cause scatter – but they can’t create a correlation where one doesn’t exist.

This result was certainly startling if not as new as I then thought. That’s why I used the title How Galaxies Don’t Form. This was contrary to our expectations, as I had explained in exhaustive detail in the long paper and revisit in a recent review for philosophers and historians of science.

I showed the same result later that year (1998) at a meeting on the campus of the University of Maryland where I was a brand new faculty member. It was a much shorter presentation, so I didn’t have time to justify the context or explain much about the data. Contrary to the reception at Rutgers where I had adequate time to speak, the hostility of the audience to the result was palpable, their stony silence eloquent. They didn’t want to believe it, and plenty of people got busy questioning the data.

Stage 1: It is not true.

I spent the next five years expanding and improving the data. More rotation curves became available thanks to the work of many, particularly Erwin de Blok, Marc Verheijen, and Rob Swaters. That was great, but the more serious limitation was how well we could measure the stellar mass distribution needed to predict the baryonic acceleration.

The mass models we could build at the time were based on optical images. A mass model takes the observed light distribution, assigns a mass-to-light ratio, and makes a numerical solution of the Poisson equation to obtain the the gravitational force corresponding to the observed stellar mass distribution. This is how we obtain the stellar contribution to the predicted baryonic force; the same procedure is applied to the observed gas distribution. The blue part of the spectrum is the best place in which to observe low contrast, low surface brightness galaxies as the night sky is darkest there, at least during new moon. That’s great for measuring the light distribution, but what we want is the stellar mass distribution. The mass-to-light ratio is expected to have a lot of scatter in the blue band simply from the happenstance of recent star formation, which makes bright blue stars that are short-lived. If there is a stochastic uptick in the star formation rate, then the mass-to-light ratio goes down because there are lots of bright stars. Wait a few hundred million years and these die off, so the mass-to-light ratio gets bigger (in the absence of further new star formation). The time-integrated stellar mass may not change much, but the amount of blue light it produces does. Consequently, we expect to see well-observed galaxies trace distinct lines in the radial acceleration plane, even if there is a single universal relation underlying the phenomenon. This happens simply because we expect to get M*/L wrong from one galaxy to the next: in 1998, I had simply assumed all galaxies had the same M*/L for lack of any better prescription. Clearly, a better prescription was warranted.

In those days, I traveled through Tucson to observe at Kitt Peak with some frequency. On one occasion, I found myself with a few hours to kill between coming down from the mountain and heading to the airport. I wandered over to the Steward Observatory at the University of Arizona to see who I might see. A chance meeting in the wild west: I encountered Eric Bell and Roelof de Jong, who were postdocs there at the time. I knew Eric from his work on the stellar populations of low surface brightness galaxies, an interest closely aligned with my own, and Roelof from my visits to Groningen.

As we got to talking, Eric described to me work they were doing on stellar populations, and how they thought it would be possible to break the age-metallicity degeneracy using near-IR colors in addition to optical colors. They were mostly focused on improving the age constraints on stars in LSB galaxies, but as I listened, I realized they had constructed a more general, more powerful tool. At my encouragement (read their acknowledgements), they took on this more general task, ultimately publishing the classic Bell & de Jong (2001). In it, they built a table that enabled one to look up the expected mass-to-light ratio of a complex stellar population – one actively forming stars – as a function of color. This was a big step forward over my educated guess of a constant mass-to-light ratio: there was now a way to use a readily observed property, color, to improve the estimated M*/L of each galaxy in a well-calibrated way.

Combining the new stellar population models with all the rotation curves then available, I obtained an improved mass discrepancy-acceleration relation:

The Radial Acceleration Relation from the data in McGaugh (2004); version using Bell’s stellar population synthesis models to estimate M*/L (see Fig. 5 for other versions). Plot credit: Federico Lelli.

Again, the relation is clear, but with scatter. Even with the improved models of Bell & de Jong, some individual galaxies have M*/L that are wrong – that’s inevitable in this game. What you cannot know is which ones! Note, however, that there are now 74 galaxies in this plot, and almost all of them fall on top of each other where the point density is large. There are some obvious outliers; those are presumably just that: the trees that fall outside the forest because of the expected scatter in M*/L estimates.

I tried a variety of prescriptions for M*/L in addition to that of Bell & de Jong. Though they differed in texture, they all told a consistent story. A relation was clearly present; only its detailed form varied with the adopted prescription.

The prescription that minimized the scatter in the relation was the M*/L obtained in MOND fits. That’s a tautology: by construction, a MOND fit finds the M*/L that puts a galaxy on this relation. However, we can generalize the result. Maybe MOND is just a weird, unexpected way of picking a number that has this property; it doesn’t have to be the true mass-to-light ratio in nature. But one can then define a ratio Q

Equation 21 of McGaugh (2004).

that relates the “true” mass-to-light ratio to the number that gives a MOND fit. They don’t have to be identical, but MOND does return M*/L that are reasonable in terms of stellar populations, so Q ~ 1. Individual values could vary, and the mean could be a bit more or less than unity, but not radically different. One thing that impressed me at the time about the MOND fits (most of which were made by Bob Sanders) was how well they agreed with the stellar population models, recovering the correct amplitude, the correct dependence on color in different bandpasses, and also giving the expected amount of scatter (more in the blue than in the near-IR).

Fig. 7 of McGaugh (2004). Stellar mass-to-light ratios of galaxies in the blue B-band (top) and near-IR K-band (bottom) as a function of BV color for the prescription of maximum disk (left) and MOND (right). Each point represents one galaxy for which the requisite data were available at the time. The line represents the mean expectation of stellar population synthesis models from Bell et al. (2003). These lines are completely independent of the data: neither the normalization nor the slope has been fit to the dynamical data. The red points are due to Sanders & Verheijen (1998); note the weak dependence of M*/L on color in the near-IR.

The obvious interpretation is that we should take seriously a theory that obtains good fits with a single free parameter that checks out admirably well with independent astrophysical constraints, in this case the M*/L expected for stellar populations. But I knew many people would not want to do that, so I defined Q to generalize to any M*/L in any (dark matter) context one might want to consider.

Indeed, Q allows us to write a general expression for the rotation curve of the dark matter halo (essentially the HAR alluded to above) in terms of that of the stars and gas:

Equation 22 of McGaugh (2004).

The stars and the gas are observed, and μ is the MOND interpolation function assumed in the fit that leads to Q. Except now the interpolation function isn’t part of some funny new theory; it is just the shape of the radial acceleration relation – a relation that is there empirically. The only fit factor between these data and any given model is Q – a single number of order unity. This does leave some wiggle room, but not much.

I went off to a conference to describe this result. At the 2006 meeting Galaxies in the Cosmic Web in New Mexico, I went out of my way at the beginning of the talk to show that even if we ignore MOND, this relation is present in the data, and it provides a strong constraint on the required distribution of dark matter. We may not know why this relation happens, but we can use it, modulo only the modest uncertainty in Q.

Having bent over backwards to distinguish the data from the theory, I was disappointed when, immediately at the end of my talk, prominent galaxy formation theorist Anatoly Klypin loudly shouted

“We don’t have to explain MOND!”

It stinks of MOND!

But you do have to explain the data. The problem was and is that the data look like MOND. It is easy to conflate one with the other; I have noticed that a lot of people have trouble keeping the two separate. Just because you don’t like the theory doesn’t mean that the data are wrong. What Anatoly was saying was that

2. It is contrary to orthodoxy.

Despite phrasing the result in a way that would be useful to galaxy formation theorists, they did not, by and large, claim to explain it at the time – it was contrary to orthodoxy so didn’t need to be explained. Looking at the list of papers that cite this result, the early adopters were not the target audience of galaxy formation theorists, but rather others citing it to say variations of “no way dark matter explains this.”

At this point, it was clear to me that further progress required a better way to measure the stellar mass distribution. Looking at the stellar population models, the best hope was to build mass models from near-infrared rather than optical data. The near-IR is dominated by old stars, especially red giants. Galaxies that have been forming stars actively for a Hubble time tend towards a quasi-equilibrium in which red giants are replenished by stellar evolution at about the same rate they move on to the next phase. One therefore expects the mass-to-light ratio to be more nearly constant in the near-IR. Not perfectly so, of course, but a 2 or 3 micron image is as close to a map of the stellar mass of a galaxy as we’re likely to get.

Around this time, the University of Maryland had begun a collaboration with Kitt Peak to build a big infrared camera, NEWFIRM, for the 4m telescope. Rob Swaters was hired to help write software to cope with the massive data flow it would produce. The instrument was divided into quadrants, each of which had a field of view sufficient to hold a typical galaxy. When it went on the telescope, we developed an efficient observing method that I called “four-shooter”, shuffling the target galaxy from quadrant to quadrant so that in processing we could remove the numerous instrumental artifacts intrinsic to its InSb detectors. This eventually became one of the standard observing modes in which the instrument was operated.

NEWFIRM in the lab in Tucson. Most of the volume is for cryogenics: the IR detectors are heliumcooled to 30 K. Partial student for scale.

I was optimistic that we could make rapid progress, and at first we did. But despite all the work, despite all the active cooling involved, we were still on the ground. The night sky was painfully bright in the IR. Indeed, the thermal component dominated, so we could observe during full moon. To an observer of low surface brightness galaxies attuned to any hint of scattered light from so much as a crescent moon, I cannot describe how discombobulating it was to walk outside the dome and see the full fricking moon. So bright. So wrong. And that wasn’t even the limiting factor: the thermal background was.

We had hit a surface brightness wall, again. We could do the bright galaxies this way, but the LSBs that sample the low acceleration end of the radial acceleration relation were rather less accessible. Not inaccessible, but there was a better way.

The Spitzer Space Telescope was active at this time. Jim Schombert and I started winning time to observe LSB galaxies with it. We discovered that space is dark. There was no atmosphere to contend with. No scattered light from the clouds or the moon or the OH lines that afflict that part of the sky spectrum. No ground-level warmth. The data were fantastic. In some sense, they were too good: the biggest headache we faced was blotting out all the background galaxies that shown right through the optically thin LSB galaxies.

Still, it took a long time to collect and analyze the data. We were starting to get results by the early-teens, but it seemed like it would take forever to get through everything I hoped to accomplish. Fortunately, when I moved to Case Western, I was able to hire Federico Lelli as a postdoc. Federico’s involvement made all the difference. After many months of hard, diligent, and exacting work, he constructed what is now the SPARC database. Finally all the elements were in place to construct an empirical radial acceleration relation with absolutely minimal assumptions about the stellar mass-to-light ratio.

In parallel with the observational work, Jim Schombert had been working hard to build realistic stellar population models that extended to the 3.6 micron band of Spitzer. Spitzer had been built to look redwards of this, further into the IR. 3.6 microns was its shortest wavelength passband. But most models at the time stopped at the K-band, the 2.2 micron band that is the reddest passband that is practically accessible from the ground. They contain pretty much the same information, but we still need to calculate the band-specific value of M*/L.

Being a thorough and careful person, Jim considered not just the star formation history of a model stellar population as a variable, and not just its average metallicity, but also the metallicity distribution of its stars, making sure that these were self-consistent with the star formation history. Realistic metallicity distributions are skewed; it turn out that this subtle effect tends to counterbalance the color dependence of the age effect on M*/L in the near-IR part of the spectrum. The net results is that we expect M*/L to be very nearly constant for all late type galaxies.

This is the best possible result. To a good approximation, we expected all of the galaxies in the SPARC sample to have the same mass-to-light ratio. What you see is what you get. No variable M*/L, no equivocation, just data in, result out.

We did still expect some scatter, as that is an irreducible fact of life in this business. But even that we expected to be small, between 0.1 and 0.15 dex (roughly 25 – 40%). Still, we expected the occasional outlier, galaxies that sit well off the main relation just because our nominal M*/L didn’t happen to apply in that case.

One day as I walked past Federico’s office, he called for me to come look at something. He had plotted all the data together assuming a single M*/L. There… were no outliers. The assumption of a constant M*/L in the near-IR didn’t just work, it worked far better than we had dared to hope. The relation leapt straight out of the data:

The Radial Acceleration Relation from the data in McGaugh et al. (2016). Plot credit: Federico Lelli.

Over 150 galaxies, with nearly 2700 resolved measurements within each galaxy, each with their own distinctive mass distribution, all pile on top of each other without effort. There was plenty of effort in building the database, but once it was there, the result appeared, no muss, no fuss. No fitting or fiddling. Just the measurements and our best estimate of the mean M*/L, applied uniformly to every individual galaxy in the sample. The scatter was only 0.12 dex, within the range expected from the population models.

No MOND was involved in the construction of this relation. It may look like MOND, but we neither use MOND nor need it in any way to see the relation. It is in the data. Perhaps this is the sort of result for which we would have to invent MOND if it did not already exist. But the dark matter paradigm is very flexible, and many papers have since appeared that claim to explain the radial acceleration relation. We have reached

3. We knew it all along.

On the one hand, this is good: the community is finally engaging with a startling fact that has been pointedly ignored for decades. On the other hand, many of the claims to explain the radial acceleration relation are transparently incorrect on their face, being nothing more than elaborations of models I considered and discarded as obviously unworkable long ago. They do not provide a satisfactory explanation of the predictive power of MOND, and inevitably fail to address important aspects of the problem, like disk stability. Rather than grapple with the deep issues the new and startling fact poses, it has become fashionable to simply assert that one’s favorite model explains the radial acceleration relation, and does so naturally.

There is nothing natural about the radial acceleration relation in the context of dark matter. Indeed, it is difficult to imagine a less natural result – hence stages one and two. So on the one hand, I welcome the belated engagement, and am willing to consider serious models. On the other hand, if someone asserts that this is natural and that we expected it all along, then the engagement isn’t genuine: they’re just fooling themselves.

Early Days. This was one of Vera Rubin’s favorite expressions. I always had a hard time with it, as many things are very well established. Yet it seems that we have yet to wrap our heads around the problem. Vera’s daughter, Judy Young, once likened the situation to the parable of the blind men and the elephant. Much is known, yes, but the problem is so vast that each of us can perceive only a part of the whole, and the whole may be quite different from the part that is right before us.

So I guess Vera is right as always: these remain Early Days.

The curious case of AGC 114905: an isolated galaxy devoid of dark matter?

The curious case of AGC 114905: an isolated galaxy devoid of dark matter?

It’s early in the new year, so what better time to violate my own resolutions? I prefer to be forward-looking and not argue over petty details, or chase wayward butterflies. But sometimes the devil is in the details, and the occasional butterfly can be entertaining if distracting. Today’s butterfly is the galaxy AGC 114905, which has recently been in the news.

There are a couple of bandwagons here: one to rebrand very low surface brightness galaxies as ultradiffuse, and another to get overly excited when these types of galaxies appear to lack dark matter. The nomenclature is terrible, but that’s normal for astronomy so I would overlook it, except that in this case it gives the impression that there is some new population of galaxies behaving in an unexpected fashion, when instead it looks to me like the opposite is the case. The extent to which there are galaxies lacking dark matter is fundamental to our interpretation of the acceleration discrepancy (aka the missing mass problem), so bears closer scrutiny. The evidence for galaxies devoid of dark matter is considerably weaker than the current bandwagon portrays.

If it were just one butterfly (e.g., NGC 1052-DF2), I wouldn’t bother. Indeed, it was that specific case that made me resolve to ignore such distractions as a waste of time. I’ve seen this movie literally hundreds of times, I know how it goes:

  • Observations of this one galaxy falsify MOND!
  • Hmm, doing the calculation right, that’s what MOND predicts.
  • OK, but better data shrink the error bars and now MOND falsified.
  • Are you sure about…?
  • Yes. We like this answer, let’s stop thinking about it now.
  • As the data continue to improve, it approaches what MOND predicts.
  • <crickets>

Over and over again. DF44 is another example that has followed this trajectory, and there are many others. This common story is not widely known – people lose interest once they get the answer they want. Irrespective of whether we can explain this weird case or that, there is a deeper story here about data analysis and interpretation that seems not to be widely appreciated.

My own experience inevitably colors my attitude about this, as it does for us all, so let’s start thirty years ago when I was writing a dissertation on low surface brightness (LSB) galaxies. I did many things in my thesis, most of them well. One of the things I tried to do then was derive rotation curves for some LSB galaxies. This was not the main point of the thesis, and arose almost as an afterthought. It was also not successful, and I did not publish the results because I didn’t believe them. It wasn’t until a few years later, with improved data, analysis software, and the concerted efforts of Erwin de Blok, that we started to get a handle on things.

The thing that really bugged me at the time was not the Doppler measurements, but the inclinations. One has to correct the observed velocities by the inclination of the disk, 1/sin(i). The inclination can be constrained by the shape of the image and by the variation of velocities across the face of the disk. LSB galaxies presented raggedy images and messy velocity fields. I found it nigh on impossible to constrain their inclinations at the time, and it remains a frequent struggle to this day.

Here is an example of the LSB galaxy F577-V1 that I find lurking around on disk from all those years ago:

The LSB galaxy F577-V1 (B-band image, left) and the run of the eccentricity of ellipses fit to the atomic gas data (right).

A uniform disk projected on the sky at some inclination will have a fixed corresponding eccentricity, with zero being the limit of a circular disk seen perfectly face-on (i = 0). Do you see a constant value of the eccentricity in the graph above? If you say yes, go get your eyes checked.

What we see in this case is a big transition from a fairly eccentric disk to one that is more nearly face on. The inclination doesn’t have a sudden warp; the problem is that the assumption of a uniform disk is invalid. This galaxy has a bar – a quasi-linear feature that is common in many spiral galaxies that is supported by non-circular orbits. Even face-on, the bar will look elongated simply because it is. Indeed, the sudden change in eccentricity is one way to define the end of the bar, which the human eye-brain can do easily by looking at the image. So in a case like this, one might adopt the inclination from the outer points, and that might even be correct. But note that there are spiral arms along the outer edge that is visible to the eye, so it isn’t clear that even these isophotes are representative of the shape of the underlying disk. Worse, we don’t know what happens beyond the edge of the data; the shape might settle down at some other level that we can’t see.

This was so frustrating, I swore never to have anything to do with galaxy kinematics ever again. Over 50 papers on the subject later, all I can say is D’oh! Repeatedly.

Bars are rare in LSB galaxies, but it struck me as odd that we saw any at all. We discovered unexpectedly that they were dark matter dominated – the inferred dark halo outweighs the disk, even within the edge defined by the stars – but that meant that the disks should be stable against the formation of bars. My colleague Chris Mihos agreed, and decided to look into it. The answer was yes, LSB galaxies should be stable against bar formation, at least internally generated bars. Sometimes bars are driven by external perturbations, so we decided to simulate the close passage of a galaxy of similar mass – basically, whack it real hard and see what happens:

Simulation of an LSB galaxy during a strong tidal encounter with another galaxy. Closest approach is at t=24 in simulation units (between the first and second box). A linear bar does not form, but the model galaxy does suffer a strong and persistent oval distortion: all these images are shown face-on (i=0). From Mihos et al (1997).

This was a conventional simulation, with a dark matter halo constructed to be consistent with the observed properties of the LSB galaxy UGC 128. The results are not specific to this case; it merely provides numerical corroboration of the more general case that we showed analytically.

Consider the image above in the context of determining galaxy inclinations from isophotal shapes. We know this object is face-on because we can control our viewing angle in the simulation. However, we would not infer i=0 from this image. If we didn’t know it had been perturbed, we would happily infer a substantial inclination – in this case, easily as much as 60 degrees! This is an intentionally extreme case, but it illustrates how a small departure from a purely circular shape can be misinterpreted as an inclination. This is a systematic error, and one that usually makes the inclination larger than it is: it is possible to appear oval when face-on, but it is not possible to appear more face-on than perfectly circular.

Around the same time, Erwin and I were making fits to the LSB galaxy data – with both dark matter halos and MOND. By this point in my career, I had deeply internalized that the data for LSB galaxies were never perfect. So we sweated every detail, and worked through every “what if?” This was a particularly onerous task for the dark matter fits, which could do many different things if this or that were assumed – we discussed all the plausible possibilities at the time. (Subsequently, a rich literature sprang up discussing many unreasonable possibilities.) By comparison, the MOND fits were easy. They had fewer knobs, and in 2/3 of the cases they simply worked, no muss, no fuss.

For the other 1/3 of the cases, we noticed that the shape of the MOND-predicted rotation curves was usually right, but the amplitude was off. How could it work so often, and yet miss in this weird way? That sounded like a systematic error, and the inclination was the most obvious culprit, with 1/sin(i) making a big difference for small inclinations. So we decided to allow this as a fit parameter, to see whether a fit could be obtained, and judge how [un]reasonable this was. Here is an example for two galaxies:

UGC 1230 (left) and UGC 5005 (right). Ovals show the nominally measured inclination (i=22o for UGC 1230 and 41o for UGC 5005, respectively) and the MOND best-fit value (i=17o and 30o). From de Blok & McGaugh (1998).

The case of UGC 1230 is memorable to me because it had a good rotation curve, despite being more face-on than widely considered acceptable for analysis. And for good reason: the difference between 22 and 17 degrees make a huge difference to the fit, changing it from way off to picture perfect.

Rotation curve fits for UGC 1230 (top) and UGC 5005 (bottom) with the inclination fixed (left) and fit (right). From de Blok & McGaugh (1998).

What I took away from this exercise is how hard it is to tell the difference between inclination values for relatively face-on galaxies. UGC 1230 is obvious: the ovals for the two inclinations are practically on top of each other. The difference in the case of UGC 5005 is more pronounced, but look at the galaxy. The shape of the outer isophote where we’re trying to measure this is raggedy as all get out; this is par for the course for LSB galaxies. Worse, look further in – this galaxy has a bar! The central bar is almost orthogonal to the kinematic major axis. If we hadn’t observed as deeply as we had, we’d think the minor axis was the major axis, and the inclination was something even higher.

I remember Erwin quipping that he should write a paper on how to use MOND to determine inclinations. This was a joke between us, but only half so: using the procedure in this way would be analogous to using Tully-Fisher to measure distances. We would simply be applying an empirically established procedure to constrain a property of a galaxy – luminosity from line-width in that case of Tully-Fisher; inclination from rotation curve shape here. That we don’t understand why this works has never stopped astronomers before.

Systematic errors in inclination happen all the time. Big surveys don’t have time to image deeply – they have too much sky area to cover – and if there is follow-up about the gas content, it inevitably comes in the form of a single dish HI measurement. This is fine; it is what we can do en masse. But an unresolved single dish measurement provides no information about the inclination, only a pre-inclination line-width (which itself is a crude proxy for the flat rotation speed). The inclination we have to take from the optical image, which would key on the easily detected, high surface brightness central region of the image. That’s the part that is most likely to show a bar-like distortion, so one can expect lots of systematic errors in the inclinations determined in this way. I provided a long yet still incomplete discussion of these issues in McGaugh (2012). This is both technical and intensely boring, so not even the pros read it.

This brings us to the case of AGC 114905, which is part of a sample of ultradiffuse galaxies discussed previously by some of the same authors. On that occasion, I kept to the code, and refrained from discussion. But for context, here are those data on a recent Baryonic Tully-Fisher plot. Spoiler alert: that post was about a different sample of galaxies that seemed to be off the relation but weren’t.

Baryonic Tully-Fisher relation showing the ultradiffuse galaxies discussed by Mancera Piña et al. (2019) as gray circles. These are all outliers from the relation; AGC 114905 is highlighted in orange. Placing much meaning in the outliers is a classic case of missing the forest for the trees. The outliers are trees. The Tully-Fisher relation is the forest.

On the face of it, these ultradiffuse galaxies (UDGs) are all very serious outliers. This is weird – they’re not some scatter off to one side, they’re just way off on their own island, with no apparent connection to the rest of established reality. By calling them a new name, UDG, it makes it sound plausible that these are some entirely novel population of galaxies that behave in a new way. But they’re not. They are exactly the same kinds of galaxies I’ve been talking about. They’re all blue, gas rich, low surface brightness, fairly isolated galaxies – all words that I’ve frequently used to describe my thesis sample. These UDGs are all a few billion solar mass is baryonic mass, very similar to F577-V1 above. You could give F577-V1 a different name, slip into the sample, and nobody would notice that it wasn’t like one of the others.

The one slight difference is implied by the name: UDGs are a little lower in surface brightness. Indeed, once filter transformations are taken into account, the definition of ultradiffuse is equal to what I arbitrarily called very low surface brightness in 1996. Most of my old LSB sample galaxies have central stellar surface brightnesses at or a bit above 10 solar masses per square parsec while the UDGs here are a bit under this threshold. For comparison, in typical high surface brightness galaxies this quantity is many hundreds, often around a thousand. Nothing magic happens at the threshold of 10 solar masses per square parsec, so this line of definition between LSB and UDG is an observational distinction without a physical difference. So what are the odds of a different result for the same kind of galaxies?

Indeed, what really matters is the baryonic surface density, not just the stellar surface brightness. A galaxy made purely of gas but no stars would have zero optical surface brightness. I don’t know of any examples of that extreme, but we came close to it with the gas rich sample of Trachternach et al. (2009) when we tried this exact same exercise a decade ago. Despite selecting that sample to maximize the chance of deviations from the Baryonic Tully-Fisher relation, we found none – at least none that were credible: there were deviant cases, but their data were terrible. There were no deviants among the better data. This sample is comparable or even extreme than the UDGs in terms of baryonic surface density, so the UDGs can’t be exception because they’re a genuinely new population, whatever name we call them by.

The key thing is the credibility of the data, so let’s consider the data for AGC 114905. The kinematics are pretty well ordered; the velocity field is well observed for this kind of beast. It ought to be; they invested over 40 hours of JVLA time into this one galaxy. That’s more than went into my entire LSB thesis sample. The authors are all capable, competent people. I don’t think they’ve done anything wrong, per se. But they do seem to have climbed aboard the bandwagon of dark matter-free UDGs, and have talked themselves into believing smaller error bars on the inclination than I am persuaded is warranted.

Here is the picture of AGC 114905 from Mancera Piña et al. (2021):

AGC 114905 in stars (left) and gas (right). The contours of the gas distribution are shown on top of the stars in white. Figure 1 from Mancera Piña et al. (2021).

This messy morphology is typical of very low surface brightness galaxies – hence their frequent classification as Irregular galaxies. Though messier, it shares some morphological traits with the LSB galaxies shown above. The central light distribution is elongated with a major axis that is not aligned with that of the gas. The gas is raggedy as all get out. The contours are somewhat boxy; this is a hint that something hinky is going on beyond circular motion in a tilted axisymmetric disk.

The authors do the right thing and worry about the inclination, checking to see what it would take to be consistent with either LCDM or MOND, which is about i=11o in stead of the 30o indicated by the shape of the outer isophote. They even build a model to check the plausibility of the smaller inclination:

Contours of models of disks with different inclinations (lines, as labeled) compared to the outer contour of the gas distribution of AGC 114905. Figure 7 from Mancera Piña et al. (2021).

Clearly the black line (i=30o) is a better fit to the shape of the gas distribution than the blue dashed line (i=11o). Consequently, they “find it unlikely that we are severely overestimating the inclination of our UDG, although this remains the largest source of uncertainty in our analysis.” I certainly agree with the latter phrase, but not the former. I think it is quite likely that they are overestimating the inclination. I wouldn’t even call it a severe overestimation; more like par for the course with this kind of object.

As I have emphasized above and elsewhere, there are many things that can go wrong in this sort of analysis. But if I were to try to put my finger on the most important thing, here it would be the inclination. The modeling exercise is good, but it assumes “razor-thin axisymmetric discs.” That’s a reasonable thing to do when building such a model, but we have to bear in mind that real disks are neither. The thickness of the disk probably doesn’t matter too much for a nearly face-on case like this, but the assumption of axisymmetry is extraordinarily dubious for an Irregular galaxy. That’s how they got the name.

It is hard to build models that are not axisymmetric. Once you drop this simplifying assumption, where do you even start? So I don’t fault them for stopping at this juncture, but I can also imagine doing as de Blok suggested, using MOND to set the inclination. Then one could build models with asymmetric features by trial and error until a match is obtained. Would we know that such a model would be a better representation of reality? No. Could we exclude such a model? Also no. So the bottom line is that I am not convinced that the uncertainty in the inclination is anywhere near as small as the adopted ±3o.

That’s very deep in the devilish details. If one is worried about a particular result, one can back off and ask if it makes sense in the context of what we already know. I’ve illustrated this process previously. First, check the empirical facts. Every other galaxy in the universe with credible data falls on the Baryonic Tully-Fisher relation, including very similar galaxies that go by a slightly different name. Hmm, strike one. Second, check what we expect from theory. I’m not a fan of theory-informed data interpretation, but we know that LCDM, unlike SCDM before it, at least gets the amplitude of the rotation speed in the right ballpark (Vflat ~ V200). Except here. Strike two. As much as we might favor LCDM as the standard cosmology, it has now been extraordinarily well established that MOND has considerable success in not just explaining but predicting these kind of data, with literally hundreds of examples. One hundred was the threshold Vera Rubin obtained to refute excuses made to explain away the first few flat rotation curves. We’ve crossed that threshold: MOND phenomenology is as well established now as flat rotation curves were at the inception of the dark matter paradigm. So while I’m open to alternative explanations for the MOND phenomenology, seeing that a few trees stand out from the forest is never going to be as important as the forest itself.

The Baryonic Tully-Fisher relation exists empirically; we have to explain it in any theory. Either we explain it, or we don’t. We can’t have it both ways, just conveniently throwing away our explanation to accommodate any discrepant observation that comes along. That’s what we’d have to do here: if we can explain the relation, we can’t very well explain the outliers. If we explain the outliers, it trashes our explanation for the relation. If some galaxies are genuine exceptions, then there are probably exceptional reasons for them to be exceptions, like a departure from equilibrium. That can happen in any theory, rendering such a test moot: a basic tenet of objectivity is that we don’t get to blame a missed prediction of LCDM on departures from equilibrium without considering the same possibility for MOND.

This brings us to a physical effect that people should be aware of. We touched on the bar stability above, and how a galaxy might look oval even when seen face on. This happens fairly naturally in MOND simulations of isolated disk galaxies. They form bars and spirals and their outer parts wobble about. See, for example, this simulation by Nils Wittenburg. This particular example is a relatively massive galaxy; the lopsidedness reminds me of M101 (Watkins et al. 2017). Lower mass galaxies deeper in the MOND regime are likely even more wobbly. This happens because disks are only marginally stable in MOND, not the over-stabilized entities that have to be hammered to show a response as in our early simulation of UGC 128 above. The point is that there is good reason to expect even isolated face-on dwarf Irregulars to look, well, irregular, leading to exactly the issues with inclination determinations discussed above. Rather than being a contradiction to MOND, AGC 114905 may illustrate one of its inevitable consequences.

I don’t like to bicker at this level of detail, but it makes a profound difference to the interpretation. I do think we should be skeptical of results that contradict well established observational reality – especially when over-hyped. God knows I was skeptical of our own results, which initially surprised the bejeepers out of me, but have been repeatedly corroborated by subsequent observations.

I guess I’m old now, so I wonder how I come across to younger practitioners; perhaps as some scary undead monster. But mates, these claims about UDGs deviating from established scaling relations are off the edge of the map.

A few of Zwicky’s rants

A few of Zwicky’s rants

An important issue in science is what’s right and what’s wrong. Another is who gets credit for what. The former issue is scientific while the second is social. It matters little to the progress of science who discovers what. It matters a lot to the people who do it. We like to get credit where due.

Nowadays, Fritz Zwicky is often credited with the discovery of dark matter for his work on clusters of galaxies in the 1930s. Indeed, in his somewhat retrospective 1971 Catalogue of Selected Compact Galaxies and of Post-Eruptive Galaxies (CSCGPEG), he claims credit for discovering clusters themselves, which were

discovered by me but contested by masses of unbelievers, [who asserted] that there exist no bona fide clusters of stable or stationary clusters of galaxies.

Zwicky, CSCGPEG

Were Zwicky alive today, a case could be made that he deserves the Nobel Prize in physics for the discovery of dark matter. However, Zwicky was not the first or only person to infer the existence of dark matter early on. Jan Oort was concerned that dark mass was necessary to explain the accelerations of stars perpendicular to the plane of the Milky Way as early as 1932. Where Zwicky’s discrepancy was huge, over a factor of 100, Oort’s was a more modest factor of 2. Oort was taken seriously at the time while Zwicky was largely ignored.

The reasons for this difference in response are many and varied. I wasn’t around at the time, so I will refrain from speculating too much. But in many ways, this divide reflects the difference in cultures between physics and astronomy. Oort was thoroughly empirical and immaculately detailed in his observational work and conservative in its interpretation, deeply impressing his fellow astronomers. Zwicky was an outsider and self-described lone wolf, and may have come across as a wild-eyed crackpot. That he had good reason for that didn’t alter the perception. That he is now posthumously recognized as having been basically correct does nothing to aid him personally, only our memory of him.

Nowadays, nearly every physicist I hear talk about the subject credits Zwicky with the discovery of dark matter. When I mention Oort, most have never heard of him, and they rarely seem prepared to share the credit. This is how history gets rewritten, by oversimplification and omission: Oort goes unmentioned in the education of physicists, the omission gets promulgated by those who never heard of him, then it becomes fact since an omission so glaring cannot possibly be correct. I’m doing that myself here by omitting mention of Opik and perhaps others I haven’t heard of myself.

Zwicky got that treatment in real time, leading to some of the best published rants in all of science. I’ll let him speak for himself, quoting from the CSCGPEG. One of his great resentments was his exclusion from premiere observational facilities:

I myself was allowed the use of the 100-inch telescope only in 1948, after I was fifty years of age, and of the 200-inch telescope on Palomar Mountain only after I was 54 years old, although I had built and successfully operated the 18-inch Schmidt telescope in 1936, and had been professor of physics and of astrophysics at the California Institute of Technology since 1927 and 1942 respectively. 

Zwicky, in the introduction to the CSCGPEG

For reference, I have observed many hundreds of nights at various observatories. Only a handful of those nights have been in my fifties. Observing is mostly a young person’s occupation.

I do not know why Zwicky was excluded. Perhaps there is a book on the subject; there should be. Maybe it was personal, as he clearly suspects. Applying for telescope time can be highly competitive, even within one’s own institution, which hardly matters if it crossed departmental lines. Perhaps his proposals lacked grounding in the expectations of the field, or some intangible quality that made them less persuasive than those of his colleagues. Maybe he simply didn’t share their scientific language, a perpetual problem I see at the interface between physics and astronomy. Perhaps all these things contributed.

More amusing if inappropriate are his ad hominem attacks on individuals:

a shining example of a most deluded individual we need only quote the high pope of American Astronomy, one Henry Norris Russell…

Zwicky, CSCGPEG

or his more generalized condemnation of the entire field:

Today’s sycophants and plain thieves seem to be free, in American Astronomy in particular, to appropriate discoveries and inventions made by lone wolves and non-conformists, for whom there is never any appeal to the hierarchies and for whom even the public Press is closed, because of censoring committees within the scientific institutions.

Zwicky, CSCGPEG

or indeed, of human nature:

we note that again and again scientists and technical specialists arrive at stagnation points where they think they know it all.

Zwicky, CSCGPEG, emphasis his.

He’s not wrong.

I have heard Zwicky described as a “spherical bastard”: a bastard viewed from any angle. You can see why from these quotes. But you can also see why he might have felt this way. The CSCGPEG was published about 35 years after his pioneering work on clusters of galaxies. That’s a career-lifetime lacking recognition for what would now be consider Nobel prize worthy work. Dark matter would come to prominence in the following decade, by which time he was dead.

I have also heard that “spherical bastard” was a phrase invented by Zwicky to apply to others. I don’t know who was the bigger bastard, and I am reluctant to attribute his lack of popularity in his own day to his personality. The testimony I am aware of is mostly from those who disagreed with him, and may themselves have been spherical bastards. Indeed, I strongly suspect those who sing his praises most loudly now would have been among his greatest detractors had they been contemporaries.

I know from my own experience that people are lousy at distinguishing between a scientific hypothesis that they don’t like and the person who advocates it. Often they are willing and eager to attribute a difference in scientific judgement to a failure of character: “He disagrees with me, therefore he is a bastard.” Trash talk by mediocre wannabes is common, and slander works wonders to run down a reputation. I imagine Zwicky was a victim of this human failing.

Of course, the correctness of a scientific hypothesis has nothing to do with how likeable its proponents might be. Indeed, a true scientist has an obligation to speak the facts, even if they are unpopular, as Zwicky reminded us with a quote of his own in the preface to the CSCGPEG:

The more things change, the more they stay the same.

Super spirals on the Tully-Fisher relation

Super spirals on the Tully-Fisher relation

A surprising and ultimately career-altering result that I encountered while in my first postdoc was that low surface brightness galaxies fell precisely on the Tully-Fisher relation. This surprising result led me to test the limits of the relation in every conceivable way. Are there galaxies that fall off it? How far is it applicable? Often, that has meant pushing the boundaries of known galaxies to ever lower surface brightness, higher gas fraction, and lower mass where galaxies are hard to find because of unavoidable selection biases in galaxy surveys: dim galaxies are hard to see.

I made a summary plot in 2017 to illustrate what we had learned to that point. There is a clear break in the stellar mass Tully-Fisher relation (left panel) that results from neglecting the mass of interstellar gas that becomes increasingly important in lower mass galaxies. The break goes away when you add in the gas mass (right panel). The relation between baryonic mass and rotation speed is continuous down to Leo P, a tiny galaxy just outside the Local Group comparable in mass to a globular cluster and the current record holder for the slowest known rotating galaxy at a mere 15 km/s.

The stellar mass (left) and baryonic (right) Tully-Fisher relations constructed in 2017 from SPARC data and gas rich galaxies. Dark blue points are star dominated galaxies; light blue points are galaxies with more mass in gas than in stars. The data are restricted to galaxies with distance measurements accurate to 20% or better; see McGaugh et al. (2019) for a discussion of the effects of different quality criteria. The line has a slope of 4 and is identical in both panels for comparison.

At the high mass end, galaxies aren’t hard to see, but they do become progressively rare: there is an exponential cut off in the intrinsic numbers of galaxies at the high mass end. So it is interesting to see how far up in mass we can go. Ogle et al. set out to do that, looking over a huge volume to identify a number of very massive galaxies, including what they dubbed “super spirals.” These extend the Tully-Fisher relation to higher masses.

The Tully-Fisher relation extended to very massive “super” spirals (blue points) by Ogle et al. (2019).

Most of the super spirals lie on the top end of the Tully-Fisher relation. However, a half dozen of the most massive cases fall off to the right. Could this be a break in the relation? So it was claimed at the time, but looking at the data, I wasn’t convinced. It looked to me like they were not always getting out to the flat part of the rotation curve, instead measuring the maximum rotation speed.

Bright galaxies tend to have rapidly rising rotation curves that peak early then fall before flattening out. For very bright galaxies – and super spirals are by definition the brightest spirals – the amplitude of the decline can be substantial, several tens of km/s. So if one measures the maximum speed instead of the flat portion of the curve, points will fall to the right of the relation. I decided not to lose any sleep over it, and wait for better data.

Better data have now been provided by Di Teodoro et al. Here is an example from their paper. The morphology of the rotation curve is typical of what we see in massive spiral galaxies. The maximum rotation speed exceeds 300 km/s, but falls to 275 km/s where it flattens out.

A super spiral (left) and its rotation curve (right) from Di Teodoro et al.

Adding the updated data to the plot, we see that the super spirals now fall on the Tully-Fisher relation, with no hint of a break. There are a couple of outliers, but those are trees. The relation is the forest.

The super spiral (red points) stellar mass (left) and baryonic (right) Tully-Fisher relations as updated by Di Teodoro et al. (2021).

That’s a good plot, but it stops at 108 solar masses, so I couldn’t resist adding the super spirals to my plot from 2017. I’ve also included the dwarfs I discussed in the last post. Together, we see that the baryonic Tully-Fisher relation is continuous over six decades in mass – a factor of million from the smallest to the largest galaxies.

The plot from above updated to include the super spirals (red points) at high mass and Local Group dwarfs (gray squares) at low mass. The SPARC data (blue points) have also been updated with new stellar population mass-to-light ratio estimates that make their bulge components a bit more massive, and with scaling relations for metallicity and molecular gas. The super spirals have been treated in the same way, and adjusted to a matching distance scale (H0 = 73 km/s/Mpc). There is some overlap between the super spirals and the most massive galaxies in SPARC; here the data are in excellent agreement. The super spirals extend to higher mass by a factor of two.

The strength of this correlation continues to amaze me. This never happens in extragalactic astronomy, where correlations are typically weak and have lots of intrinsic scatter. The opposite is true here. This must be telling us something.

The obvious thing that this is telling us is MOND. The initial report that super spirals fell off of the Tully-Fisher relation was widely hailed as a disproof of MOND. I’ve seen this movie many times, so I am not surprised that the answer changed in this fashion. It happens over and over again. Even less surprising is that there is no retraction, no self-examination of whether maybe we jumped to the wrong conclusion.

I get it. I couldn’t believe it myself, to start. I struggled for many years to explain the data conventionally in terms of dark matter. Worked my ass off trying to save the paradigm. Try as I might, nothing worked. Since then, many people have claimed to explain what I could not, but so far all I have seen are variations on models that I had already rejected as obviously unworkable. They either make unsubstantiated assumptions, building a tautology, or simply claim more than they demonstrate. As long as you say what people want to hear, you will be held to a very low standard. If you say what they don’t want to hear, what they are conditioned not to believe, then no standard of proof is high enough.

MOND was the only theory to predict the observed behavior a priori. There are no free parameters in the plots above. We measure the mass and the rotation speed. The data fall on the predicted line. Dark matter models did not predict this, and can at best hope to provide a convoluted, retroactive explanation. Why should I be impressed by that?

Divergence

Divergence

I read somewhere – I don’t think it was Kuhn himself, but someone analyzing Kuhn – that there came a point in the history of science where there was a divergence between scientists, with different scientists disagreeing about what counts as a theory, what counts as a test of a theory, what even counts as evidence. We have reached that point with the mass discrepancy problem.

For many years, I worried that if the field ever caught up with me, it would zoom past. That hasn’t happened. Instead, it has diverged towards a place that I barely recognize as science. It looks more like the Matrix – a simulation – that is increasingly sophisticated yet self-contained, making only parsimonious contact with observational reality and unable to make predictions that apply to real objects. Scaling relations and statistical properties, sure. Actual galaxies with NGC numbers, not so much. That, to me, is not science.

I have found it increasingly difficult to communicate across the gap built on presumptions buried so deep that they cannot be questioned. One obvious one is the existence of dark matter. This has been fueled by cosmologists who take it for granted and particle physicists eager to discover it who repeat “we know dark matter exists*; we just need to find it” like a religious mantra. This is now ingrained so deeply that it has become difficult to convey even the simple concept that what we call “dark matter” is really just evidence of a discrepancy: we do not know whether it is literally some kind of invisible mass, or a breakdown of the equations that lead us to infer invisible mass.

I try to look at all sides of a problem. I can say nice things about dark matter (and cosmology); I can point out problems with it. I can say nice things about MOND; I can point out problems with it. The more common approach is to presume that any failing of MOND is an automatic win for dark matter. This is a simple-minded logical fallacy: just because MOND gets something wrong doesn’t mean dark matter gets it right. Indeed, my experience has been that cases that don’t make any sense in MOND don’t make any sense in terms of dark matter either. Nevertheless, this attitude persists.

I made this flowchart as a joke in 2012, but it persists in being an uncomfortably fair depiction of how many people who work on dark matter approach the problem.

I don’t know what is right, but I’m pretty sure this attitude is wrong. Indeed, it empowers a form of magical thinking: dark matter has to be correct, so any data that appear to contradict it are either wrong, or can be explained with feedback. Indeed, the usual trajectory has been denial first (that can’t be true!) and explanation later (we knew it all along!) This attitude is an existential threat to the scientific method, and I am despondent in part because I worry we are slipping into a post-scientific reality, where even scientists are little more than priests of a cold, dark religion.


*If we’re sure dark matter exists, it is not obvious that we need to be doing expensive experiments to find it.

Why bother?