By the wayside

By the wayside

I noted last time that in the rush to analyze the first of the JWST data, that “some of these candidate high redshift galaxies will fall by the wayside.” As Maurice Aabe notes in the comments there, this has already happened.

I was concerned because of previous work with Jay Franck in which we found that photometric redshifts were simply not adequately precise to identify the clusters and protoclusters we were looking for. Consequently, we made it a selection criterion when constructing the CCPC to require spectroscopic redshifts. The issue then was that it wasn’t good enough to have a rough idea of the redshift, as the photometric method often provides (what exactly it provides depends in a complicated way on the redshift range, the stellar population modeling, and the wavelength range covered by the observational data that is available). To identify a candidate protocluster, you want to know that all the potential member galaxies are really at the same redshift.

This requirement is somewhat relaxed for the field population, in which a common approach is to ask broader questions of the data like “how many galaxies are at z ~ 6? z ~ 7?” etc. Photometric redshifts, when done properly, ought to suffice for this. However, I had noticed in Jay’s work that there were times when apparently reasonable photometric redshift estimates went badly wrong. So it made the ganglia twitch when I noticed that in early JWST work – specifically Table 2 of the first version of a paper by Adams et al. – there were seven objects with candidate photometric redshifts, and three already had a preexisting spectroscopic redshift. The photometric redshifts were mostly around z ~ 9.7, but the three spectroscopic redshifts were all smaller: two z ~ 7.6, one 8.5.

Three objects are not enough to infer a systematic bias, so I made a mental note and moved on. But given our previous experience, it did not inspire confidence that all the available cases disagreed, and that all the spectroscopic redshifts were lower than the photometric estimates. These things combined to give this observer a serious case of “the heebie-jeebies.”

Adams et al have now posted a revised analysis in which many (not all) redshifts change, and change by a lot. Here is their new Table 4:

Table 4 from Adams et al. (2022, version 2).

There are some cases here that appear to confirm and improve the initial estimate of a high redshift. For example, SMACS-z11e had a very uncertain initial redshift estimate. In the revised analysis, it is still at z~11, but with much higher confidence.

That said, it is hard to put a positive spin on these numbers. 23 of 31 redshifts change, and many change drastically. Those that change all become smaller. The highest surviving redshift estimate is z ~ 15 for SMACS-z16b. Among the objects with very high candidate redshifts, some are practically local (e.g., SMACS-z12a, F150DB-075, F150DA-058).

So… I had expected that this could go wrong, but I didn’t think it would go this wrong. I was concerned about the photometric redshift method – how well we can model stellar populations, especially at young ages dominated by short lived stars that in the early universe are presumably lower metallicity than well-studied nearby examples, the degeneracies between galaxies at very different redshifts but presenting similar colors over a finite range of observed passbands, dust (the eternal scourge of observational astronomy, expected to be an especially severe affliction in the ultraviolet that gets redshifted into the near-IR for high-z objects, both because dust is very efficient at scattering UV photons and because this efficiency varies a lot with metallicity and the exact gran size distribution of the dust), when is a dropout really a dropout indicating the location of the Lyman break and when is it just a lousy upper limit of a shabby detection, etc. – I could go on, but I think I already have. It will take time to sort these things out, even in the best of worlds.

We do not live in the best of worlds.

It appears that a big part of the current uncertainty is a calibration error. There is a pipeline for handling JWST data that has an in-built calibration for how many counts in a JWST image correspond to what astronomical magnitude. The JWST instrument team warned us that the initial estimate of this calibration would “improve as we go deeper into Cycle 1” – see slide 13 of Jane Rigby’s AAS presentation.

I was not previously aware of this caveat, though I’m certainly not surprised by it. This is how these things work – one makes an initial estimate based on the available data, and one improves it as more data become available. Apparently, JWST is outperforming its specs, so it is seeing as much as 0.3 magnitudes deeper than anticipated. This means that people were inferring objects to be that much too bright, hence the appearance of lots of galaxies that seem to be brighter than expected, and an apparent systematic bias to high z for photometric redshift estimators.

I was not at the AAS meeting, let alone Dr. Rigby’s presentation there. Even if I had been, I’m not sure I would have appreciated the potential impact of that last bullet point on nearly the last slide. So I’m not the least bit surprised that this error has propagated into the literature. This is unfortunate, but at least this time it didn’t lead to something as bad as the Challenger space shuttle disaster in which the relevant warning from the engineers was reputed to have been buried in an obscure bullet point list.

So now we need to take a deep breath and do things right. I understand the urgency to get the first exciting results out, and they are still exciting. There are still some interesting high z candidate galaxies, and lots of empirical evidence predating JWST indicating that galaxies may have become too big too soon. However, we can only begin to argue about the interpretation of this once we agree to what the facts are. At this juncture, it is more important to get the numbers right than to post early, potentially ill-advised takes on arXiv.

That said, I’d like to go back to writing my own ill-advised take to post on arXiv now.

An early result from JWST

An early result from JWST

There has been a veritable feeding frenzy going on with the first JWST data. This is to be expected. Also to be expected is that some of these early results will ultimately prove to have been premature. So – caveat emptor! That said, I want to highlight one important aspect of these early results, there being too many to do all them all justice.

The basic theme is that people are finding very faint yet surprisingly bright galaxies that are consistent with being at redshift 9 and above. The universe has expanded by a factor of ten since then, when it was barely half a billion years old. That’s a long time to you and me, and even to a geologist, but it is a relatively short time for a universe that is now over 13 billion years old, and it isn’t a lot of time for objects as large as galaxies to form.

In the standard LCDM cosmogony, we expect large galaxies to build up from the merger of many smaller galaxies. These smaller galaxies form first, and many of the stars that end up in big galaxies may have formed in these smaller galaxies prior to merging. So when we look to high redshift, we expect to catch this formation-by-merging process in action. We should see lots of small, actively star forming protogalactic fragments (Searle-Zinn fragments in Old School speak) before they’ve had time to assemble into the large galaxies we see relatively nearby to us at low redshift.

So what are we seeing? Here is one example from Labbe et al.:

JWST images of a candidate galaxy at z~10 in different filters, ordered by increasing wavelength from optical light (left) to the mid-infrared (right). Image credit: Labbe et al.

Not much to look at, is it? But really it is pretty awesome for light that has been traveling 13 billion years to get to us and had its wavelength stretched by a factor of ten. Measuring the brightness in these various passbands enables us to estimate both its redshift and stellar mass:

The JWST data plotted as a spectrum (points) with template stellar population models (lines) that indicate a mass of nearly 85 billion suns at z=9.92. Image credit: Labbe et al.

Eighty five billion solar masses is a lot of stars. It’s a bit bigger than the Milky Way, which has had the full 13+ billion years to make its complement of roughly 60 billion solar masses of stars. Object 19424 is a big galaxy, and it grew up fast.

In LCDM, it is not particularly hard to build a model that forms a lot of stars early on. What is challenging is assembling this many into a single object. We should see lots of much smaller fragments (and may yet still) but we shouldn’t see many really big objects like this already in place. How many there are is a critical question.

Labbe et al. make an estimate of the stellar mass density in massive high redshift galaxies, and find it to be rather a lot. This is a fraught exercise in the best of circumstances when one has excellent data for thousands of galaxies. Here we have only a handful. We must also assume that the small region surveyed is typical, which it may not be. Moreover, the photometric redshift method illustrated above is fraught. It looks convincing. It is convincing. It also gives me the heebie-jeebies. Many times I have seen photometric redshifts turn out to be wrong when good spectroscopic data are obtained. But usually the method works, and it’s what we got so far, so let’s see where this ride takes us.

A short paper that nicely illustrates the prime issue is provided by Prof. Boylan-Kolchin. His key figure:

The integrated mass density of stars as a function of the stellar mass of individual galaxies, or equivalently, the baryons available to form stars in their dark matter halos. The data of Labbe et al. reside in the forbidden region (shaded) where there are more stars than there is normal matter from which to make them. Image credit: Boylan-Kolchin.

The basic issue is that there are too many stars in these big galaxies. There are many astrophysical uncertainties about how stars form: how fast, how efficiently, with what mass distribution, etc., etc. – much of the literature is obsessed with these issues. In contrast, once the parameters of cosmology are known, as we think them to be, it is relatively straightforward to calculate the number density of dark matter halos as a function of mass at a given redshift. This is the dark skeleton on which large scale structure depends; getting this right is absolutely fundamental to the cold dark matter picture.

Every dark matter halo should host a universal fraction of normal matter. The baryon fraction (fb) is known to be very close to 16% in LCDM. Prof. Boylan-Kolchin points out that this sets an important upper limit on how many stars could possibly form. The shaded region in the figure above is excluded: there simply isn’t enough normal matter to make that many stars. The data of Labbe et al. fall in this region, which should be impossible.

The data only fall a little way into the excluded region, so maybe it doesn’t look that bad, but the real situation is more dire. Star formation is very inefficient, but the shaded region assumes that all the available material has been converted into stars. A more realistic expectation is closer to the gray line (ε = 0.1), not the hard limit where all the available material has been magically turned into stars with a cosmic snap of the fingers.

Indeed, I would argue that the real efficiency ε is likely lower than 0.1 as it is locally. This runs into problems with precursors of the JWST result, so we’ve already been under pressure to tweak this free parameter upwards. Turning it up to eleven is just the inevitable consequence of needing to get more stars to form in the first big halos to appear sooner than the theory naturally predicts.

So, does this spell doom for LCDM? I doubt it. There are too many uncertainties at present. It is an intriguing result, but it will take a lot of follow-up work to sort out. I expect some of these candidate high redshift galaxies will fall by the wayside, and turn out to be objects at lower redshift. How many, and how that impacts the basic result, remains to be determined.

After years of testing LCDM, it would be ironic if it could be falsified by this one simple (expensive, technologically amazing) observation. Still, it is something important to watch, as it is at least conceivable that we could measure a stellar mass density that is impossibly high. Wither then?

These are early days.

JWST Twitter Bender

JWST Twitter Bender

I went on a bit of a twitter bender yesterday about the early claims about high mass galaxies at high redshift, which went on long enough I thought I should share it here.


For those watching the astro community freak out about bright, high redshift galaxies being detected by JWST, some historical context in an amusing anecdote…

The 1998 October conference was titled “After the dark ages, when galaxies were young (the universe at 2 < z < 5).” That right there tells you what we were expecting. Redshift 5 was high – when the universe was a mere billion years old. Before that, not much going on (dark ages).

This was when the now famous SN Ia results corroborating the acceleration of the expansion rate predicted by concordance LCDM were shiny and new. Many of us already strongly suspected we needed to put the Lambda back in cosmology; the SN results sealed the deal.

One of the many lines of evidence leading to the rehabilitation of Lambda – previously anathema – was that we needed a bit more time to get observed structures to form. One wants the universe to be older than its contents, an off and on problem with globular clusters for forever.

A natural question that arises is just how early do galaxies form? The horizon of z=7 came up in discussion at lunch, with those of us who were observers wondering how we might access that (JWST being the answer long in the making).

Famed simulator Carlos Frenk was there, and assured us not to worry. He had already done LCDM simulations, and knew the timing.

“There is nothing above redshift 7.”

He also added “don’t quote me on that,” which I’ve respected until now, but I think the statute of limitations has expired.

Everyone present immediately pulled out their wallet and chipped in $5 to endow the “7-up” prize for the first persuasive detection of an object at or above redshift seven.

A committee was formed to evaluate claims that might appear in the literature, composed of Carlos, Vera Rubin, and Bruce Partridge. They made it clear that they would require a high standard of evidence: at least two well-identified lines; no dropouts or photo-z’s.

That standard wasn’t met for over a decade, with z=6.96 being the record holder for a while. The 7-up prize was entirely tongue in cheek, and everyone forgot about it. Marv Leventhal had offered to hold the money; I guess he ended up pocketing it.

I believe the winner of the 7-up prize should have been Nial Tanvir for GRB090423 at z~8.2, but I haven’t checked if there might be other credible claims, and I can’t speak for the committee.

At any rate, I don’t think anyone would now seriously dispute that there are galaxies at z>7. The question is how big do they get, how early? And the eternal mobile goalpost, what does LCDM really predict?

Carlos was not wrong. There is no hard cutoff, so I won’t quibble about arbitrary boundaries like z=7. It takes time to assemble big galaxies, & LCDM does make a reasonably clear prediction about the timeline for that to occur. Basically, they shouldn’t be all that big that soon.

Here is a figure adapted from the thesis Jay Franck wrote here 5 years ago using Spitzer data (round points). It shows the characteristic brightness (Schechter M*) of galaxies as a function of redshift. The data diverge from the LCDM prediction (squares) as redshift increases.

The divergence happens because real galaxies are brighter (more stellar mass has assembled into a single object) than predicted by the hierarchical timeline expected in LCDM.

Remarkably, the data roughly follow the green line, which is an L* galaxy magically put in place at the inconceivably high redshift of z=10. Galaxies seem to have gotten big impossibly early. This is why you see us astronomers flipping our lids at the JWST results. Can’t happen.

Except that it can, and was predicted to do so by Bob Sanders a quarter century ago: “Objects of galaxy mass are the first virialized objects to form (by z=10) and larger structure develops rapidly.”

The reason is MOND. After decoupling, the baryons find themselves bereft of radiation support and suddenly deep in the low acceleration regime. Structure grows fast and becomes nonlinear almost immediately. It’s as if there is tons more dark matter than we infer nowadays.

I referreed that paper, and was a bit disappointed that Bob had beat me to it: I was doing something similar at the time, with similar results. Instead of being hard to form structure quickly as in LCDM, it’s practically impossible to avoid in MOND.

He beat me to it, so I abandoned writing that paper. No need to say the same thing twice! Didn’t think we’d have to wait so long to test it.

I’ve reviewed this many times. Most recently in January, in anticipation of JWST, on my blog.

See also http://astroweb.case.edu/ssm/mond/LSSinMOND.html… and the references therein. For a more formal review, see A Tale of Two Paradigms: the Mutual Incommensurability of LCDM and MOND. Or Modified Newtonian Dynamics (MOND): Observational Phenomenology and Relativistic Extensions. Or Modified Newtonian Dynamics as an Alternative to Dark Matter.

How many times does it have to be said?

But you get the point. Every time you see someone describe the big galaxies JWST is seeing as unexpected, what they mean is unexpected in LCDM. It doesn’t surprise me at all. It is entirely expected in MOND, and was predicted a priori.

The really interesting thing to me, though, remains what LCDM really predicts. I already see people rationalizing excuses. I’ve seen this happen before. Many times. That’s why the field is in a rut.

Progress towards the dark land.

So are we gonna talk our way out of it this time? I’m no longer interested in how; I’m sure someone will suggest something that will gain traction no matter how unsatisfactory.

Special pleading.

The only interesting question is if LCDM makes a prediction here that can’t be fudged. If it does, then it can be falsified. If it doesn’t, it isn’t science.

Experimentalist with no clue what he has signed up for about to find out how hard it is to hunt down an invisible target.

But can we? Is LCDM subject to falsification? Or will we yet again gaslight ourselves into believing that we knew it all along?

LZ: another non-detection

LZ: another non-detection

Just as I was leaving for a week’s vacation, the dark matter search experiment LZ reported its first results. Now that I’m back, I see that I didn’t miss anything. Here is their figure of merit:

The latest experimental limits on WIMP dark matter from LZ (arXiv:2207.03764). The parameter space above the line is excluded. Note the scale on the y-axis bearing in mind that the original expectation was for a cross section around 10-39 cm2, well above the top edge of this graph.

LZ is a merger of two previous experiments compelled to grow still bigger in the never-ending search for dark matter. It contains “seven active tonnes of liquid xenon,” which is an absurd amount, being a substantial fraction of the entire terrestrial supply. It all has to be super-cooled to near absolute zero and filtered of all contaminants that might include naturally radioactive isotopes that might mimic the sought-after signal of dark matter scattering off of xenon nuclei. It is a technological tour de force.

The technology is really fantastic. The experimentalists have accomplished amazing things in building these detectors. They have accomplished the target sensitivity, and then some. If WIMPs existed, they should have found them by now.

WIMPs have not been discovered. As the experiments have improved, the theorists have been obliged to repeatedly move the goalposts. The original (1980s) expectation for the interaction cross-section was 10-39 cm2. That was quickly excluded, but more careful (1990s) calculation suggested perhaps more like 10-42 cm2. This was also excluded experimentally. By the late 2000s, the “prediction” had migrated to 10-46 cm2. This has also now been excluded, so the goalposts have been moved to 10-48 cm2. This migration has been driven entirely by the data; there is nothing miraculous about a WIMP with this cross section.

As remarkable a technological accomplishment as experiments like LZ are, they are becoming the definition of insanity: repeating the same action but expecting a different result.

For comparison, consider the LIGO detection of gravitational waves. A large team of scientists worked unspeakably hard to achieve the detection of a tiny effect. It took 40 years of failure before success was obtained. Until that point, it seemed much the same: repeating the same action but expecting a different result.

Except it wasn’t, because there was a clear expectation for the sensitivity that was required to detect gravitational waves. Once that sensitivity was achieved, they were detected. It wasn’t that simple of course, but close enough for our purposes: it took a long time to get where they were going, but they achieved success once they got there. Having a clear prediction is essential.

In the case of WIMP searches, there was also a clear prediction. The required sensitivity was achieved – long ago. Nothing was found, so the goalposts were moved – by a lot. Then the new required sensitivity was achieved, still without detection. Repeatedly.

It always makes sense to look harder for something you expect if at first you don’t succeed. But at some point, you have to give up: you ain’t gonna find it. This is disappointing, but we’ve all experienced this kind of disappointment at some point in our lives. The tricky part is deciding when to give up.

In science, the point to give up is when your hypothesis is falsified. The original WIMP hypothesis was falsified a long time ago. We keep it on life support with modifications, often obfuscating (to our students and to ourselves) that the WIMPs we’re talking about today are no longer the WIMPs we originally conceived.

I sometimes like to imagine the thought experiment of sending some of the more zealous WIMP advocates back in time to talk to their younger selves. What would they say? How would they respond to themselves? These are not people who like to be contradicted by anyone, even themselves, so I suspect it would go something like

Old scientist: “Hey, kid – I’m future you. This experiment you’re about to spend your life working on won’t detect what you’re looking for.”

Young scientist: “Uh huh. You say you’re me from the future, Mr. Credibility? Tell me: at what point do I go senile, you doddering old fool?”

Old scientist: “You don’t. It just won’t work out the way you think. On top of dark matter, there’s also dark energy…”

Young scientist: “What the heck is dark energy, you drooling crackpot?”

Old scientist: “The cosmological constant.”

Young scientist: “The cosmological constant! You can’t expect people to take you seriously talking about that rubbish. GTFO.”

That’s the polite version that doesn’t end in fisticuffs. It’s easy to imagine this conversation going south much faster. I know that if 1993 me had received a visit from 1998 me telling me that in five years I would have come to doubt WIMPs, and also would have demonstrated that the answer to the missing mass problem might not be dark matter at all, I… would not have taken it well.

That’s why predictions are important in science. They tell us when to change our mind. When to stop what we’re doing because it’s not working. When to admit that we were wrong, and maybe consider something else. Maybe that something else won’t prove correct. Maybe the next ten something elses won’t. But we’ll never find out if we won’t let go of the first wrong thing.

Some Outsider Perspective from Insiders

Some Outsider Perspective from Insiders

Avi Loeb has a nice recent post Recalculating Academia, in which he discusses some of the issues confronting modern academia. One of the reasons I haven’t written here for a couple of months is despondency over the same problems. If you’re here reading this, you’ll likely be interested in what he has to say.

I am not eager to write at length today, but I do want to amplify some of the examples he gives with my own experience. For example, he notes that there are

theoretical physicists who avoid the guillotine of empirical tests for half a century by dedicating their career to abstract conjectures, avoid the risk of being proven wrong while demonstrating mathematical virtuosity.

Avi Loeb

I recognize many kinds of theoretical physicists who fit this description. My first thought was string theory, which took off in the mid-80s when I was a grad student at Princeton, ground zero for that movement in the US. (The Russians indulged in this independently.) I remember a colloquium in which David Gross advocated the “theory of everything” with gratuitous religious fervor to a large audience of eager listeners quavering with anticipation with the texture of religious revelation. It was captivating and convincing, up until the point near the end when he noted that experimental tests were many orders of magnitude beyond any experiment conceivable at the time. That… wasn’t physics to me. If this was the path the field was going down, I wanted no part of it. This was one of many factors that precipitated my departure from the toxic sludge that was grad student life in the Princeton physics department.

I wish I could say I had been proven wrong. Instead, decades later, physics has nothing to show for its embrace of string theory. There have been some impressive development in mathematics stemming from it. Mathematics, not physics. And yet, there persists a large community of theoretical physicists who wander endlessly in the barren and practically infinite parameter space of multidimensional string theory. Maybe there is something relevant to physical reality there, or maybe it hasn’t been found because there isn’t. At what point does one admit that the objective being sought just ain’t there? [Death. For many people, the answer seems to be never. They keep repeating the same fruitless endeavor until they die.]

We do have new physics, in the form of massive neutrinos and the dark matter problem and the apparent acceleration of the expansion rate of the universe. What we don’t have is the expected evidence for supersymmetry, the crazy-bold yet comparatively humble first step on the road to string theory. If they had got even this much right, we should have seen evidence for it at the LHC, for example in the decay of the aptly named BS meson. If supersymmetric particles existed, they should provide many options for the meson to decay into, which otherwise has few options in the Standard Model of particle physics. This was a strong prediction of minimal supersymmetry, so much so that it was called the Golden Test of supersymmetry. After hearing this over and over in the ’80s and ’90s, I have not heard it again any time in this century. I’m nor sure when the theorists stopped talking about this embarrassment, but I suspect it is long enough ago now that it will come as a surprise to younger scientists, even those who work in the field. Supersymmetry flunked the golden test, and it flunked it hard. Rather than abandon the theory (some did), we just stopped talking about. There persists a large community of theorists who take supersymmetry for granted, and react with hostility if you question that Obvious Truth. They will tell you with condescension that only minimal supersymmetry is ruled out; there is an enormous parameter space still open for their imaginations to run wild, unbridled by experimental constraint. This is both true and pathetic.

Reading about the history of physics, I learned that there was a community of physicists who persisted believing in aether for decades after the Michelson-Morley experiment. After all, only some forms of aether were ruled out. This was true, at the time, but we don’t bother with that detail when teaching physics now. Instead, it gets streamlined to “aether was falsified by Michelson-Morley.” This is, in retrospect, true, and we don’t bother to mention those who pathetically kept after it.

The standard candidate for dark matter, the WIMP, is a supersymmetric particle. If supersymmetry is wrong, WIMPs don’t exist. And yet, there is a large community of particle physicists who persist in building ever bigger and better experiments designed to detect WIMPs. Funny enough, they haven’t detected anything. It was a good hypothesis, 38 years ago. Now its just a bad habit. The better ones tacitly acknowledge this, attributing their continuing efforts to the streetlight effect: you look where you can see.

Prof. Loeb offers another pertinent example:

When I ask graduating students at their thesis exam whether the cold dark matter paradigm will be proven wrong if their computer simulations will be in conflict with future data, they almost always say that any disagreement will indicate that they should add a missing ingredient to their theoretical model in order to “fix” the discrepancy.

Avi Loeb

This is indeed the attitude. So much so that no additional ingredient seems to absurd if it is what we need to save the phenomenon. Feedback is the obvious example in my own field, as that (or the synonyms “baryon physics” or “gastrophysics”) is invoked to explain away any and all discrepancies. It sounds simple, since feedback is a real effect that does happen, but this single word does a lot of complicated work under the hood. There are many distinct kinds of feedback: stellar winds, UV radiation from massive stars, supernova when those stars explode, X-rays from compact sources like neutron stars, and relativistic jets from supermasive black holes at the centers of galactic nuclei. These are the examples of feedback that I can think of off the top of my head, there are probably more. All of these things have perceptible, real-world effects on the relevant scales, with, for example, stars blowing apart the dust and gas of their stellar cocoons after they form. This very real process has bugger all to do with what feedback is invoked to do on galactic scales. Usually, supernova are blamed by theorists for any and all problems in dwarf galaxies, while observers tell me that stellar winds do most of the work in disrupting star forming regions. Confronted with this apparent discrepancy, the usual answer is that it doesn’t matter how the energy is input into the interstellar medium, just that it is. Yet we can see profound differences between stellar winds and supernova explosions, so this does not inspire confidence for the predictive power of theories that generically invoke feedback to explain away problems that wouldn’t be there in a healthy theory.

This started a long time ago. I had already lost patience with this unscientific attitude to the point that I dubbed it the

Spergel Principle: “It is better to postdict than to predict.”

McGaugh 1998

This continues to go on and has now done so for so long that generations of students seem to think that this is how science is supposed to be done. If asked about hypothesis testing and whether a theory can be falsified, many theorists will first look mystified, then act put out. Why would you even ask that? (One does not question the paradigm.) The minority of better ones then rally to come up with some reason to justify that yes, what they’re talking about can be falsified, so it does qualify as physics. But those goalposts can always be moved.

A good example of moving goalposts is the cusp-core problem. When I first encountered this in the mid to late ’90s, I tried to figure a way out of it, but failed. So I consulted one of the very best theorists, Simon White. When I asked him what he thought would constitute a falsification of cold dark matter, he said cusps: “cusps have to be there” [in the center of a dark matter halo]. Flash forward to today, when nobody would accept that as a falsification of cold dark matter: it can be fixed by feedback. Which would be fine, if it were true, which isn’t really clear. At best it provides a post facto explanation for an unpredicted phenomenon without addressing the underlying root cause, that the baryon distribution is predictive of the dynamics.

This is like putting a band-aid on a Tyrannosaurus. It’s already dead and fossilized. And if it isn’t, well, you got bigger problems.

Another disease common to theory is avoidance. A problem is first ignored, then the data are blamed for showing the wrong thing, then they are explained in a way that may or may not be satisfactory. Either way, it is treated as something that had been expected all along.

In a parallel to this gaslighting, I’ve noticed that it has become fashionable of late to describe unsatisfactory explanations as “natural.” Saying that something can be explained naturally is a powerful argument in science. The traditional meaning is that ok, we hadn’t contemplated this phenomena before it surprised us, but if we sit down and work it out, it makes sense. The “making sense” part means that an answer falls out of a theory easily when the right question is posed. If you need to run gazillions of supercomputer CPU hours of a simulation with a bunch of knobs for feedback to get something that sorta kinda approximates reality but not really, your result does not qualify as natural. It might be right – that’s a more involved adjudication – but it doesn’t qualify as natural and the current fad to abuse this term again does not inspire confidence that the results of such simulations might somehow be right. Just makes me suspect the theorists are fooling themselves.

I haven’t even talked about astroparticle physicists or those who engage in fantasies about the multiverse. I’ll just close by noting that Popper’s criterion for falsification was intended to distinguish between physics and metaphysics. That’s not the same as right or wrong, but physics is subject to experimental test while metaphysics is the stuff of late night bull sessions. The multiverse is manifestly metaphysical. Cool to think about, has lots of implications for philosophy and religion, but not physics. Even Gross has warned against treading down the garden path of the multiverse. (Tell me that you’re warning others not to make the same mistakes you made without admitting you made mistakes.)

There are a lot of scientists who would like to do away with Popper, or any requirement that physics be testable. These are inevitably the same people whose fancy turns to metascapes of mathematically beautiful if fruitless theories, and want to pass off their metaphysical ramblings as real physics. Don’t buy it.

When you have eliminated the impossible…

When you have eliminated the impossible…

In previous posts, I briefly described some of the results that provoked a crisis of faith in the mid-1990s. Up until that point, I was an ardent believer in the cold dark matter paradigm. But it no longer made sense as an explanation for galaxy dynamics. It didn’t just not make sense, it seemed strewn with self-contradictions, all of which persist to this day.

Amidst this crisis of faith, there came a chance meeting in Middle-Earth: Moti Milgrom visited Cambridge, where I was a postdoc at the time, and gave a talk. I almost didn’t go to this talk because it had modified gravity in the title and who wanted to waste their time listening to that nonsense? I had yet to make any connection between the self-contradictions the data posed for dark matter and something as dire as an entirely different paradigm.

Despite my misgivings, I did go to Milgrom’s talk. Not knowing that I was there or what I worked on, he casually remarked on some specific predictions for low surface brightness galaxies. These sounded like what I was seeing, in particular the things that were most troublesome for the dark matter interpretation. I became interested.

Long story short, it is a case in which, had MOND not already existed, we would have had to invent it. As Sherlock Holmes famously put it

When you have eliminated the impossible, whatever remains, however improbable, must be the truth.

Sir Arthur Conan Doyle

Modified Newtonian Dynamics

There is one and only one theory that predicted in advance the observations described above: the Modified Newtonian Dynamics (MOND) introduced by Milgrom (1983a,b,c). MOND is an extension of Newtonian theory (Milgrom, 2020). It is not a generally covariant theory, so is not, by itself, a complete replacement for General Relativity. Nevertheless, it makes unique, testable predictions within its regime of applicability (McGaugh, 2020).

The basic idea of MOND is that the force law is modified at an acceleration scale, a0. For large accelerations, g ​≫ ​a0, everything is normal and Newtonian: g ​= ​gN, where gN is the acceleration predicted by the observed luminous mass distribution obtained by solving the Poisson equation. At low accelerations, the effective acceleration tends towards the limit

g → √(a0gN) for g ≪ a0 (5)

(Bekenstein & Milgrom, 1984; Milgrom, 1983c). This limit is called the deep MOND regime in contrast to the Newtonian regime at high accelerations. The two regimes are smoothly connected by an interpolation function μ(g/a0) that is not specified (Milgrom, 1983c).

The motivation to make an acceleration-based modification is to explain flat rotation curves (Bosma, 1981; Rubin et al., 1978) that also gives a steep Tully-Fisher relation similar to that which is observed (Aaronson et al., 1979). A test particle in a circular orbit around a point mass Mp in the deep MOND regime (eq. (5)) will experience a centripetal acceleration

Vc2/R = √(a0GMp/R2). (6)

Note that the term for the radius R cancels out, so eq. (6) reduces to

Vc4 = a0GMp (7)

which the reader will recognize as the Baryonic Tully-Fisher relation

Mb = A Vf4 (8)

with A = ζ/(a0G) where ζ is a geometrical factor of order unity.

This simple math explains the flatness of rotation curves. This is not a prediction; it was an input that motivated the theory, as it motivated dark matter. Unlike dark matter, in which rotation curves might rise or fall, the rotation curves of isolated galaxies must tend towards asymptotic flatness.

MOND also explains the Tully-Fisher relation. Indeed, there are several distinct aspects to this prediction. That the relation exists at all is a strong prediction. Fundamentally, the Baryonic Tully-Fisher Relation (BTFR) is a relation between the baryonic mass of a galaxy and its flat rotation speed. There is no dark matter involved: Vf is not a property of a dark matter halo, but of the galaxy itself.

One MOND prediction is the slope of the BTFR: the power law scaling M ~ Vx has x ​= ​4 exactly. While the infrared data of Aaronson et al. (1979) suggested such a slope, the exact value was not well constrained at that time. It was not until later that Tully-Fisher was empirically recognized as a relation driven by baryonic mass (McGaugh et al., 2000), as anticipated by MOND. Moreover, the slope is only four when a good measurement of the flat rotation velocity is available (Verheijen, 2001; McGaugh, 2005, 2012); common proxies like the line-width only crudely approximate the result and typically return shallower slopes (e.g., Zaritsky et al., 2014), as do samples of limited dynamic range (e.g., Pizagno et al., 2007). The latter are common in the literature: selection effects strongly favor bright galaxies, and the majority of published Tully-Fisher relations are dominated by high mass galaxies (M ​> ​1010 ​M). Consequently, the behavior of the Baryonic Tully-Fisher relation remains somewhat controversial to this day (e.g., Mancera Piña et al., 2019; Ogle et al., 2019). This appears to be entirely a matter of data quality (McGaugh et al., 2019). The slope of the relation is indistinguishable from 4 when a modicum of quality control is imposed (Lelli et al., 2016b; McGaugh, 2005, 2012; Schombert et al., 2020; Stark et al., 2009; Trachternach et al., 2009). Indeed, only a slope of four successfully predicted the rotation speeds of low mass galaxies (Giovanelli et al., 2013; McGaugh, 2011).

Another aspect of the Tully-Fisher relation is its normalization. This is set by fundamental constants: Newton’s constant, G, and the acceleration scale of MOND, a0. For ζ ​= ​0.8, A ​= ​50 ​M km−4 s4. However, there is no theory that predicts the value of a0, which has to be set by the data. Moreover, this scale is distance-dependent, so the precise value of a0 varies with adjustments to the distance scale. For this reason, in part, the initial estimate of a0 ​= ​2 ​× ​10−10 ​m ​s−2 of (Milgrom, 1983a) was a bit high. Begeman et al. (1991) used the best data then available to obtain a0 ​= ​1.2 ​× ​10−10 ​m ​s−2. The value of Milgrom’s acceleration constant has not varied meaningfully since then (Famaey and McGaugh, 2012; Li et al., 2018; McGaugh, 2011; McGaugh et al., 2016; Sanders and McGaugh, 2002). This is a consistency check, but not a genuine7 prediction.

An important consequence of MOND is that the Tully-Fisher relation is absolute: it should have no dependence on size or surface brightness (Milgrom, 1983a). The mass of baryons is the only thing that sets the flat amplitude of the rotation speed. It matters not at all how those baryons are distributed. MOND was the only theory to correctly predict this in advance of the observation (McGaugh and de Blok, 1998b). The fine-tuning problem that we face conventionally is imposed by this otherwise unanticipated result.

The absolute nature of the Tully-Fisher relation in MOND further predicts that it has no physical residuals whatsoever. That is to say, scatter around the relation can only be caused by observational errors and scatter in the mass-to-light ratios of the stars. The latter is an irreducible unknown: we measure the luminosity produced by the stars in a galaxy, but what we need to know is the mass of those stars. The conversion between them can never be perfect, and inevitably introduces some scatter into the relation. Nevertheless, we can make our best effort to account for known sources of scatter. Between scatter expected from observational uncertainties and that induced by variations in the mass-to-light ratio, the best data are consistent with the prediction of zero intrinsic scatter (McGaugh, 2005, 2012; Lelli et al., 2016b, 2019). Of course, it is impossible to measure zero, but it is possible to set an upper limit on the intrinsic scatter that is very tight by extragalactic standards (<6% Lelli et al., 2019). This leaves very little room for variations beyond the inevitable impact of the stellar mass-to-light ratio. The scatter is no longer entirely accounted for when lower quality data are considered (McGaugh, 2012), but this is expected in astronomy: lower quality data inevitably admit systematic uncertainties that are not readily accounted for in the error budget.

Milgrom (1983a) made a number of other specific predictions. In MOND, the acceleration expected for kinematics follows from the surface density of baryons. Consequently, low surface brightness means low acceleration. Interpreted in terms of conventional dynamics, the prediction is that the ratio of dynamical mass to light, Mdyn/L should increase as surface brightness decreases. This happens both globally — LSB galaxies appear to be more dark matter dominated than HSB galaxies (see Fig. 4(b) of McGaugh and de Blok, 1998a), and locally — the need for dark matter sets in at smaller radii in LSB galaxies than in HSB galaxies (Figs. 3 and 14 of McGaugh and de Blok, 1998b; Famaey and McGaugh, 2012, respectively).

One may also test this prediction by plotting the rotation curves of galaxies binned by surface brightness: acceleration should scale with surface brightness. It does (Figs. 4 and 16 of McGaugh and de Blok, 1998b; Famaey and McGaugh, 2012, respectively). This observation has been confirmed by near-infrared data. The systematic variation of color coded surface brightness is already obvious with optical data, as in Fig. 15 of Famaey and McGaugh (2012), but these suffer some scatter from variations in the stellar mass-to-light ratio. These practically vanish with near-infrared data, which provide such a good tracer of the surface mass density of stars that the equivalent plot is a near-perfect rainbow (Fig. 3 of both McGaugh et al., 2019; McGaugh, 2020). The data strongly corroborate the prediction of MOND that acceleration follows from baryonic surface density.

The central density relation (Fig. 6, Lelli et al., 2016c) was also predicted by MOND (Milgrom, 2016). Both the shape and the amplitude of the correlation are correct. Moreover, the surface density Σ at which the data bend follows directly from the acceleration scale of MOND: a0 ​= ​GΣ. This surface density also corresponds to the stability limit for disks (Brada & Milgrom, 1999; Milgrom, 1989). The scale we had to insert by hand in dark matter models is a consequence of MOND.

Since MOND is a force law, the entirety of the rotation curve should follow from the baryonic mass distribution. The stellar mass-to-light ratio can modulate the amplitude of the stellar contribution to the rotation curve, but not its shape, which is specified by the observed distribution of light. Consequently, there is rather limited freedom in fitting rotation curves.

Example fits are shown in Fig. 8. The procedure is to construct Newtonian mass models by numerically solving the Poisson equation to determine the gravitational potential that corresponds to the observed baryonic mass distribution. Indeed, it is important to make a rigorous solution of the Poisson equation in order to capture details in the shape of the mass distribution (e.g., the wiggles in Fig. 8). Common analytic approximations like the exponential disk assume these features out of existence. Building proper mass models involves separate observations for the stars, conducted at optical or near-infrared wavelengths, and the gas of the interstellar medium, which is traced by radio wavelength observations. It is sometimes necessary to consider separate mass-to-light ratios for the stellar bulge and disk components, as there can be astrophysical differences between these distinct stellar populations (Baade, 1944). This distinction applies in any theory.

Fig. 8. Example rotation curve fits. MOND fits (heavy solid lines: Li et al., 2018) to the rotation curves of a bright, star-dominated galaxy (UGC 2953, left panel) and a faint, gas-dominated galaxy (DDO 64, right panel). The thin solid lines shows the Newtonian expectation, which is the sum of the atomic gas (dotted lines), stellar disk (dashed lines), and stellar bulge (dash-dotted line; present only in UGC 2953). Note the different scales: UGC 2953 is approximately 400 times more massive than DDO 64.

The gravitational potential of each baryonic component is represented by the circular velocity of a test particle in Fig. 8. The amplitude of the rotation curve of the mass model for each stellar component scales as the square root of its mass-to-light ratio. There is no corresponding mass-to-light ratio for the gas of the interstellar medium as there is a well-understood relation between the observed flux at 21 ​cm and the mass of hydrogen atoms that emit it (Draine, 2011). Consequently, the line for the gas components in Fig. 8 is practically fixed.

In addition to the mass-to-light ratio, there are two “nuisance” parameters that are sometimes considered in MOND fits: distance and inclination. These are known from independent observations, but of course these have some uncertainty. Consequently, the best MOND fit sometimes occurs for slightly different values of the distance and inclination, within their observational uncertainties (Begeman et al., 1991; de Blok & McGaugh, 1998; Sanders, 1996).

Distance matters because it sets the absolute scale. The further a galaxy, the greater its mass for the same observed flux. The distances to individual galaxies are notoriously difficult to measure. Though usually not important, small changes to the distance can occasionally have powerful effects, especially in gas rich galaxies. Compare, for example, the fit to DDO 154 by Li et al. (2018) to that of Ren et al. (2019).

Inclinations matter because we must correct the observed velocities for the inclination of each galaxy as projected on the sky. The inclination correction is V = Vobs/sin(i), so is small at large inclinations (edge-on) but large at small inclinations (face-on). For this reason, dynamical analyses often impose an inclination limit. This is an issue in any theory, but MOND is particularly sensitive since M ​∝ ​V4 so any errors in the inclination are amplified to the fourth power (see Fig. 2 of de Blok & McGaugh, 1998). Worse, inclination estimates can suffer systematic errors (de Blok & McGaugh, 1998; McGaugh, 2012; Verheijen, 2001): a galaxy seen face-on may have an oval distortion that makes it look more inclined than it is, but it can’t be more face-on than face-on.

MOND fits will fail if either the distance or inclination is wrong. Such problems cannot be discerned in fits with dark matter halos, which have ample flexibility to absorb the imparted variance (see Fig. 6 of de Blok & McGaugh, 1998). Consequently, a fit with a dark matter halo will not fail if the distance happens to be wrong; we just won’t notice it.

MOND generally fits rotation curves well (Angus et al, 2012, 2015; Begeman et al., 1991; de Blok & McGaugh, 1998; Famaey and McGaugh, 2012; Gentile et al, 2010, 2011; Haghi et al., 2016; Hees et al., 2016; Kent, 1987; Li et al., 2018; Milgrom, 1988; Sánchez-Salcedo et al., 2013; Sanders, 1996, 2019; Sanders and McGaugh, 2002; Sanders and Verheijen, 1998; Swaters et al., 2010). There are of course exceptions (e.g, NGC 2915 Li et al., 2018). This is to be expected, as there are always some misleading data, especially in astronomy where it is impossible to control for systematic effects in the same manner that is possible in closed laboratories. It is easily forgotten that this type of analysis assumes circular orbits in a static potential, a condition that many spiral galaxies appear to have achieved to a reasonable approximation but which certainly will not hold in all cases.

The best-fit mass-to-light ratios found in MOND rotation curve fits can be checked against independent stellar population models. There is no guarantee that this procedure will return plausible values for the stellar mass-to-light ratio. Nevertheless, MOND fits recover the amplitude that is expected for stellar populations, the expected variation with color, and the band-dependent scatter (e.g., Fig. 28 of Famaey and McGaugh, 2012). Indeed, to a good approximation, the rotation curve can be predicted directly from near-infrared data (McGaugh, 2020; Sanders and Verheijen, 1998) modulo only the inevitable scatter in the mass-to-light ratio. This is a spectacular success of the paradigm that is not shared by dark matter fits (de Blok et al., 2003; de Blok & McGaugh, 1997; Kent, 1987).

Gas rich galaxies provide an even stronger test. When gas dominates the mass budget, the mass-to-light ratio of the stars ceases to have much leverage on the fit. There is no fitting parameter for gas equivalent to the mass-to-light ratio for stars: the gas mass follows directly from the observations. This enables MOND to predict the locations of such galaxies in the Baryonic Tully-Fisher plane (McGaugh, 2011) and essentially their full rotation curves (Sanders, 2019) with no free parameters (McGaugh, 2020).

It should be noted that the acceleration scale a0 ​is kept fixed when fitting rotation curves. If one allows a0 ​to vary, both it and the mass-to-light ratio spread over an unphysically large range of values (Li et al., ​2018). The two are highly degenerate, causing such fits to be meaningless (Li et al., 2021): the data do not have the power to constrain multiple parameters per galaxy.

Table 2 lists the successful predictions of MOND that are discussed here. A more comprehensive list is given by Famaey and McGaugh (2012) and McGaugh (2020) who also discuss some of the problems posed for dark matter. MOND has had many predictive successes beyond rotation curves (e.g., McGaugh and Milgrom, 2013a,b; McGaugh, 2016) and has inspired successful predictions in cosmology (e.g., Sanders, 1998; McGaugh, 1999, 2000; Sanders, 2001; McGaugh, 2015, 2018). In this context, it makes sense to associate LSB galaxies with low density fluctuations in the initial conditions, thereby recovering the success of DD while its ills are cured by the modified force law. Galaxy formation in general is likely to proceed hierarchically but much more rapidly than in ΛCDM (Sanders, 2001; Stachniewicz and Kutschera, 2001), providing a natural explanation for both the age of stars in elliptical galaxies and allowing for a subsequent settling time for the disks of spiral galaxies (Wittenburg et al., 2020).

PredictionObservation
Tully-Fisher Relation
Slope ​= ​4+
No size or surface brightness residuals+
Mdyn/L depends on surface brightness+
Central density relation+
Rotation curve fits+
Stellar population mass-to-light ratios+
Mb alone specifies Vf+
Table 2. Predictions of MOND.

The expert cosmologist may object that there is a great deal more data that must be satisfied. These have been reviewed elsewhere (Bekenstein, 2006; Famaey and McGaugh, 2012; McGaugh, 2015; Sanders and McGaugh, 2002) and are beyond the scope of this discussion. Here I note only that my experience has been that reports of MOND’s falsification are greatly exaggerated. Indeed, it has a great deal more explanatory power for a wider variety of phenomena than is generally appreciated (McGaugh and de Blok, 1998a,b).

The most serious, though certainly not the only, outstanding challenge to MOND is the dynamics of clusters of galaxies (Angus et al., 2008; Sanders and McGaugh, 2002). Contrary to the case in most individual galaxies and some groups of galaxies (Milgrom, 2018, 2019), MOND typically falls short of correcting the mass discrepancy in rich clusters by a factor of ~ 2 in mass. This can be taken as completely fatal, or as a being remarkably close by the standards of astrophysics. Which option one chooses seems to be mostly a matter of confirmation bias: those who are quick to dismiss MOND are happy to spot their own models a factor of two in mass, and even to assert that it is natural to do so (e.g., Ludlow et al., 2017). MOND is hardly alone in suffering problems with clusters of galaxies, which also present problems for ΛCDM (e.g., Angus & McGaugh, 2008; Asencio et al., 2021; Meneghetti et al., 2020).

A common fallacy seems to be that any failing of MOND is automatically considered to be support for ΛCDM. This is seldom the case. More often than not, observations that are problematic for MOND are also problematic for ΛCDM. We do not perceive them as such because we are already convinced that non-baryonic dark matter must exist. From that perspective, any problem encountered by ΛCDM is a mere puzzle that will inevitably be solved, while any problem encountered by MOND is a terminal failure of an irredeemably blasphemous hypothesis. This speaks volumes about human nature but says nothing about how the universe works.


The plain fact is that MOND made many a priori predictions that subsequently came true. This is the essence of the scientific method. LCDM and MOND are largely incommensurate, but whenever I have been able to make a straight comparison, MOND has been the more successful theory. So what am I supposed to say? That it is wrong? Perhaps it is, but that doesn’t make dark matter right. Rather, the predictive successes of MOND must be teaching us something. The field will not progress until these are incorporated into mainstream thinking.

Cosmic whack-a-mole

Cosmic whack-a-mole

The fine-tuning problem encountered by dark matter models that I talked about last time is generic. The knee-jerk reaction of most workers seems to be “let’s build a more sophisticated model.” That’s reasonable – if there is any hope of recovery. The attitude is that dark matter has to be right so something has to work out. This fails to even contemplate the existential challenge that the fine-tuning problem imposes.

Perhaps I am wrong to be pessimistic, but my concern is well informed by years upon years trying to avoid this conclusion. Most of the claims I have seen to the contrary are just specialized versions of the generic models I had already built: they contain the same failings, but these go unrecognized because the presumption is that something has to work out, so people are often quick to declare “close enough!”

In my experience, fixing one thing in a model often breaks something else. It becomes a game of cosmic whack-a-mole. If you succeed in suppressing the scatter in one relation, it pops out somewhere else. A model that seems like it passes the test you built it to pass flunks as soon as you confront it with another test.

Let’s consider a few examples.


Squeezing the toothpaste tube

Our efforts to evade one fine-tuning problem often lead to another. This has been my general experience in many efforts to construct viable dark matter models. It is like squeezing a tube of toothpaste: every time we smooth out the problems in one part of the tube, we simply squeeze them into a different part. There are many published claims to solve this problem or that, but they frequently fail to acknowledge (or notice) that the purported solution to one problem creates another.

One example is provided by Courteau and Rix (1999). They invoke dark matter domination to explain the lack of residuals in the Tully-Fisher relation. In this limit, Mb/R ​≪ ​MDM/R and the baryons leave no mark on the rotation curve. This can reconcile the model with the Tully-Fisher relation, but it makes a strong prediction. It is not just the flat rotation speed that is the same for galaxies of the same mass, but the entirety of the rotation curve, V(R) at all radii. The stars are just convenient tracers of the dark matter halo in this limit; the dynamics are entirely dominated by the dark matter. The hypothesized solution fixes the problem that is addressed, but creates another problem that is not addressed, in this case the observed variation in rotation curve shape.

The limit of complete dark matter domination is not consistent with the shapes of rotation curves. Galaxies of the same baryonic mass have the same flat outer velocity (Tully-Fisher), but the shapes of their rotation curves vary systematically with surface brightness (de Blok & McGaugh, 1996; Tully and Verheijen, 1997; McGaugh and de Blok, 1998a,b; Swaters et al., 2009, 2012; Lelli et al., 2013, 2016c). High surface brightness galaxies have steeply rising rotation curves while LSB galaxies have slowly rising rotation curves (Fig. 6). This systematic dependence of the inner rotation curve shape on the baryon distribution excludes the SH hypothesis in the limit of dark matter domination: the distribution of the baryons clearly has an impact on the dynamics.

Fig. 6. Rotation curve shapes and surface density. The left panel shows the rotation curves of two galaxies, one HSB (NGC 2403, open circles) and one LSB (UGC 128, filled circles) (de Blok & McGaugh, 1996; Verheijen and de Blok, 1999; Kuzio de Naray et al., 2008). These galaxies have very nearly the same baryonic mass (~ 1010 ​M), and asymptote to approximately the same flat rotation speed (~ 130 ​km ​s−1). Consequently, they are indistinguishable in the Tully-Fisher plane (Fig. 4). However, the inner shapes of the rotation curves are readily distinguishable: the HSB galaxy has a steeply rising rotation curve while the LSB galaxy has a more gradual rise. This is a general phenomenon, as illustrated by the central density relation (right panel: Lelli et al., 2016c) where each point is one galaxy; NGC 2403 and UGC 128 are highlighted as open points. The central dynamical mass surface density (Σdyn) measured by the rate of rise of the rotation curve (Toomre, 1963) correlates with the central surface density of the stars (Σ0) measured by their surface brightness. The line shows 1:1 correspondence: no dark matter is required near the centers of HSB galaxies. The need for dark matter appears below 1000 ​M pc−2 and grows systematically greater to lower surface brightness. This is the origin of the statement that LSB galaxies are dark matter dominated.

A more recent example of this toothpaste tube problem for SH-type models is provided by the EAGLE simulations (Schaye et al., 2015). These are claimed (Ludlow et al., 2017) to explain one aspect of the observations, the radial acceleration relation (McGaugh et al., 2016), but fail to explain another, the central density relation (Lelli et al., 2016c) seen in Fig. 6. This was called the ‘diversity’ problem by Oman et al. (2015), who note that the rotation velocity at a specific, small radius (2 kpc) varies considerably from galaxy to galaxy observationally (Fig. 6), while simulated galaxies show essentially no variation, with only a small amount of scatter. This diversity problem is exactly the same problem that was pointed out before [compare Fig. 5 of Oman et al. (2015) to Fig. 14 of McGaugh and de Blok (1998a)].

There is no single, universally accepted standard galaxy formation model, but a common touchstone is provided by Mo et al. (1998). Their base model has a constant ratio of luminous to dark mass md [their assumption (i)], which provides a reasonable description of the sizes of galaxies as a function of mass or rotation speed (Fig. 7). However, this model predicts the wrong slope (3 rather than 4) for the Tully-Fisher relation. This is easily remedied by making the luminous mass fraction proportional to the rotation speed (md ​∝ ​Vf), which then provides an adequate fit to the Tully-Fisher4 relation. This has the undesirable effect of destroying the consistency of the size-mass relation. We can have one or the other, but not both.

Fig. 7. Galaxy size (as measured by the exponential disk scale length, left) and mass (right) as a function of rotation velocity. The latter is the Baryonic Tully-Fisher relation; the data are the same as in Fig. 4. The solid lines are Mo et al. (1998) models with constant md (their equations 12 and 16). This is in reasonable agreement with the size-speed relation but not the BTFR. The latter may be fit by adopting a variable md ​∝ ​Vf (dashed lines), but this ruins agreement with the size-speed relation. This is typical of dark matter models in which fixing one thing breaks another.

This failure of the Mo et al. (1998) model provides another example of the toothpaste tube problem. By fixing one problem, we create another. The only way forward is to consider more complex models with additional degrees of freedom.

Feedback

It has become conventional to invoke ‘feedback’ to address the various problems that afflict galaxy formation theory (Bullock & Boylan-Kolchin, 2017; De Baerdemaker and Boyd, 2020). It goes by other monikers as well, variously being called ‘gastrophysics’5 for gas phase astrophysics, or simply ‘baryonic physics’ for any process that might intervene between the relatively simple (and calculable) physics of collisionless cold dark matter and messy observational reality (which is entirely illuminated by the baryons). This proliferation of terminology obfuscates the boundaries of the subject and precludes a comprehensive discussion.

Feedback is not a single process, but rather a family of distinct processes. The common feature of different forms of feedback is the deposition of energy from compact sources into the surrounding gas of the interstellar medium. This can, at least in principle, heat gas and drive large-scale winds, either preventing gas from cooling and forming too many stars, or ejecting it from a galaxy outright. This in turn might affect the distribution of dark matter, though the effect is weak: one must move a lot of baryons for their gravity to impact the dark matter distribution.

There are many kinds of feedback, and many devils in the details. Massive, short-lived stars produce copious amounts of ultraviolet radiation that heats and ionizes the surrounding gas and erodes interstellar dust. These stars also produce strong winds through much of their short (~ 10 Myr) lives, and ultimately explode as Type II supernovae. These three mechanisms each act in a distinct way on different time scales. That’s just the feedback associated with massive stars; there are many other mechanisms (e.g., Type Ia supernovae are distinct from Type II supernovae, and Active Galactic Nuclei are a completely different beast entirely). The situation is extremely complicated. While the various forms of stellar feedback are readily apparent on the small scales of stars, it is far from obvious that they have the desired impact on the much larger scales of entire galaxies.

For any one kind of feedback, there can be many substantially different implementations in galaxy formation simulations. Independent numerical codes do not generally return compatible results for identical initial conditions (Scannapieco et al., 2012): there is no consensus on how feedback works. Among the many different computational implementations of feedback, at most one can be correct.

Most galaxy formation codes do not resolve the scale of single stars where stellar feedback occurs. They rely on some empirically calibrated, analytic approximation to model this ‘sub-grid physics’ — which is to say, they don’t simulate feedback at all. Rather, they simulate the accumulation of gas in one resolution element, then follow some prescription for what happens inside that unresolved box. This provides ample opportunity for disputes over the implementation and effects of feedback. For example, feedback is often cited as a way to address the cusp-core problem — or not, depending on the implementation (e.g., Benítez-Llambay et al., 2019; Bose et al., 2019; Di Cintio et al., 2014; Governato et al., 2012; Madau et al., 2014; Read et al., 2019). High resolution simulations (Bland-Hawthorn et al., 2015) indicate that the gas of the interstellar medium is less affected by feedback effects than assumed by typical sub-grid prescriptions: most of the energy is funneled through the lowest density gas — the course of least resistance — and is lost to the intergalactic medium without much impacting the galaxy in which it originates.

From the perspective of the philosophy of science, feedback is an auxiliary hypothesis invoked to patch up theories of galaxy formation. Indeed, since there are many distinct flavors of feedback that are invoked to carry out a variety of different tasks, feedback is really a suite of auxiliary hypotheses. This violates parsimony to an extreme and brutal degree.

This concern for parsimony is not specific to any particular feedback scheme; it is not just a matter of which feedback prescription is best. The entire approach is to invoke as many free parameters as necessary to solve any and all problems that might be encountered. There is little doubt that such models can be constructed to match the data, even data that bear little resemblance to the obvious predictions of the paradigm (McGaugh and de Blok, 1998a; Mo et al., 1998). So the concern is not whether ΛCDM galaxy formation models can explain the data; it is that they can’t not.


One could go on at much greater length about feedback and its impact on galaxy formation. This is pointless. It is a form of magical thinking to expect that the combined effects of numerous complicated feedback effects are going to always add up to looking like MOND in each and every galaxy. It is also the working presumption of an entire field of modern science.

Two Hypotheses

Two Hypotheses

OK, basic review is over. Shit’s gonna get real. Here I give a short recounting of the primary reason I came to doubt the dark matter paradigm. This is entirely conventional – my concern about the viability of dark matter is a contradiction within its own context. It had nothing to do with MOND, which I was blissfully ignorant of when I ran head-long into this problem in 1994. Most of the community chooses to remain blissfully ignorant, which I understand: it’s way more comfortable. It is also why the field has remained mired in the ’90s, with all the apparent progress since then being nothing more than the perpetual reinvention of the same square wheel.


To make a completely generic point that does not depend on the specifics of dark matter halo profiles or the details of baryonic assembly, I discuss two basic hypotheses for the distribution of disk galaxy size at a given mass. These broad categories I label SH (Same Halo) and DD (Density begets Density) following McGaugh and de Blok (1998a). In both cases, galaxies of a given baryonic mass are assumed to reside in dark matter halos of a corresponding total mass. Hence, at a given halo mass, the baryonic mass is the same, and variations in galaxy size follow from one of two basic effects:

  • SH: variations in size follow from variations in the spin of the parent dark matter halo.
  • DD: variations in surface brightness follow from variations in the density of the dark matter halo.

Recall that at a given luminosity, size and surface brightness are not independent, so variation in one corresponds to variation in the other. Consequently, we have two distinct ideas for why galaxies of the same mass vary in size. In SH, the halo may have the same density profile ρ(r), and it is only variations in angular momentum that dictate variations in the disk size. In DD, variations in the surface brightness of the luminous disk are reflections of variations in the density profile ρ(r) of the dark matter halo. In principle, one could have a combination of both effects, but we will keep them separate for this discussion, and note that mixing them defeats the virtues of each without curing their ills.

The SH hypothesis traces back to at least Fall and Efstathiou (1980). The notion is simple: variations in the size of disks correspond to variations in the angular momentum of their host dark matter halos. The mass destined to become a dark matter halo initially expands with the rest of the universe, reaching some maximum radius before collapsing to form a gravitationally bound object. At the point of maximum expansion, the nascent dark matter halos torque one another, inducing a small but non-zero net spin in each, quantified by the dimensionless spin parameter λ (Peebles, 1969). One then imagines that as a disk forms within a dark matter halo, it collapses until it is centrifugally supported: λ → 1 from some initially small value (typically λ ​≈ ​0.05, Barnes & Efstathiou, 1987, with some modest distribution about this median value). The spin parameter thus determines the collapse factor and the extent of the disk: low spin halos harbor compact, high surface brightness disks while high spin halos produce extended, low surface brightness disks.

The distribution of primordial spins is fairly narrow, and does not correlate with environment (Barnes & Efstathiou, 1987). The narrow distribution was invoked as an explanation for Freeman’s Law: the small variation in spins from halo to halo resulted in a narrow distribution of disk central surface brightness (van der Kruit, 1987). This association, while apparently natural, proved to be incorrect: when one goes through the mathematics to transform spin into scale length, even a narrow distribution of initial spins predicts a broad distribution in surface brightness (Dalcanton, Spergel, & Summers, 1997; McGaugh and de Blok, 1998a). Indeed, it predicts too broad a distribution: to prevent the formation of galaxies much higher in surface brightness than observed, one must invoke a stability criterion (Dalcanton, Spergel, & Summers, 1997; McGaugh and de Blok, 1998a) that precludes the existence of very high surface brightness disks. While it is physically quite reasonable that such a criterion should exist (Ostriker and Peebles, 1973), the observed surface density threshold does not emerge naturally, and must be inserted by hand. It is an auxiliary hypothesis invoked to preserve SH. Once done, size variations and the trend of average size with mass work out in reasonable quantitative detail (e.g., Mo et al., 1998).

Angular momentum conservation must hold for an isolated galaxy, but the assumption made in SH is stronger: baryons conserve their share of the angular momentum independently of the dark matter. It is considered a virtue that this simple assumption leads to disk sizes that are about right. However, this assumption is not well justified. Baryons and dark matter are free to exchange angular momentum with each other, and are seen to do so in simulations that track both components (e.g., Book et al., 2011; Combes, 2013; Klypin et al., 2002). There is no guarantee that this exchange is equitable, and in general it is not: as baryons collapse to form a small galaxy within a large dark matter halo, they tend to lose angular momentum to the dark matter. This is a one-way street that runs in the wrong direction, with the final destination uncomfortably invisible with most of the angular momentum sequestered in the unobservable dark matter. Worse still, if we impose rigorous angular momentum conservation among the baryons, the result is a disk with a completely unrealistic surface density profile (van den Bosch, 2001a). It then becomes necessary to pick and choose which baryons manage to assemble into the disk and which are expelled or otherwise excluded, thereby solving one problem by creating another.

Early work on LSB disk galaxies led to a rather different picture. Compared to the previously known population of HSB galaxies around which our theories had been built, the LSB galaxy population has a younger mean stellar age (de Blok & van der Hulst, 1998; McGaugh and Bothun, 1994), a lower content of heavy elements (McGaugh, 1994), and a systematically higher gas fraction (McGaugh and de Blok, 1997; Schombert et al., 1997). These properties suggested that LSB galaxies evolve more gradually than their higher surface brightness brethren: they convert their gas into stars over a much longer timescale (McGaugh et al., 2017). The obvious culprit for this difference is surface density: lower surface brightness galaxies have less gravity, hence less ability to gather their diffuse interstellar medium into dense clumps that could form stars (Gerritsen and de Blok, 1999; Mihos et al., 1999). It seemed reasonable to ascribe the low surface density of the baryons to a correspondingly low density of their parent dark matter halos.

One way to think about a region in the early universe that will eventually collapse to form a galaxy is as a so-called top-hat over-density. The mass density Ωm → 1 ​at early times, irrespective of its current value, so a spherical region (the top-hat) that is somewhat over-dense early on may locally exceed the critical density. We may then consider this finite region as its own little closed universe, and follow its evolution with the Friedmann equations with Ω ​> ​1. The top-hat will initially expand along with the rest of the universe, but will eventually reach a maximum radius and recollapse. When that happens depends on the density. The greater the over-density, the sooner the top-hat will recollapse. Conversely, a lesser over-density will take longer to reach maximum expansion before recollapsing.

Everything about LSB galaxies suggested that they were lower density, late-forming systems. It therefore seemed quite natural to imagine a distribution of over-densities and corresponding collapse times for top-hats of similar mass, and to associate LSB galaxy with the lesser over-densities (Dekel and Silk, 1986; McGaugh, 1992). More recently, some essential aspects of this idea have been revived under the monicker of “assembly bias” (e.g. Zehavi et al., 2018).

The work that informed the DD hypothesis was based largely on photometric and spectroscopic observations of LSB galaxies: their size and surface brightness, color, chemical abundance, and gas content. DD made two obvious predictions that had not yet been tested at that juncture. First, late-forming halos should reside preferentially in low density environments. This is a generic consequence of Gaussian initial conditions: big peaks defined on small (e.g., galaxy) scales are more likely to be found in big peaks defined on large (e.g., cluster) scales, and vice-versa. Second, the density of the dark matter halo of an LSB galaxy should be lower than that of an equal mass halo containing and HSB galaxy. This predicts a clear signature in their rotation speeds, which should be lower for lower density.

The prediction for the spatial distribution of LSB galaxies was tested by Bothun et al. (1993) and Mo et al. (1994). The test showed the expected effect: LSB galaxies were less strongly clustered than HSB galaxies. They are clustered: both galaxy populations follow the same large scale structure, but HSB galaxies adhere more strongly to it. In terms of the correlation function, the LSB sample available at the time had about half the amplitude r0 as comparison HSB samples (Mo et al., 1994). The effect was even more pronounced on the smallest scales (<2 Mpc: Bothun et al., 1993), leading Mo et al. (1994) to construct a model that successfully explained both small and large scale aspects of the spatial distribution of LSB galaxies simply by associating them with dark matter halos that lacked close interactions with other halos. This was strong corroboration of the DD hypothesis.

One way to test the prediction of DD that LSB galaxies should rotate more slowly than HSB galaxies was to use the Tully-Fisher relation (Tully and Fisher, 1977) as a point of reference. Originally identified as an empirical relation between optical luminosity and the observed line-width of single-dish 21 ​cm observations, more fundamentally it turns out to be a relation between the baryonic mass of a galaxy (stars plus gas) and its flat rotation speed the Baryonic Tully-Fisher relation (BTFR: McGaugh et al., 2000). This relation is a simple power law of the form

Mb = AVf4 (equation 1)

with A ​≈ ​50 ​M km−4 s4 (McGaugh, 2005).

Aaronson et al. (1979) provided a straightforward interpretation for a relation of this form. A test particle orbiting a mass M at a distance R will have a circular speed V

V2 = GM/R (equation 2)

where G is Newton’s constant. If we square this, a relation like the Tully-Fisher relation follows:

V4 = (GM/R)2 &propto; MΣ (equation 3)

where we have introduced the surface mass density Σ ​= ​M/R2. The Tully-Fisher relation M ​∝ ​V4 is recovered if Σ is constant, exactly as expected from Freeman’s Law (Freeman, 1970).

LSB galaxies, by definition, have central surface brightnesses (and corresponding stellar surface densities Σ0) that are less than the Freeman value. Consequently, DD predicts, through equation (3), that LSB galaxies should shift systematically off the Tully-Fisher relation: lower Σ means lower velocity. The predicted effect is not subtle2 (Fig. 4). For the range of surface brightness that had become available, the predicted shift should have stood out like the proverbial sore thumb. It did not (Hoffman et al., 1996; McGaugh and de Blok, 1998a; Sprayberry et al., 1995; Zwaan et al., 1995). This had an immediate impact on galaxy formation theory: compare Dalcanton et al. (1995, who predict a shift in Tully-Fisher with surface brightness) with Dalcanton et al. (1997b, who do not).

Fig. 4. The Baryonic Tully-Fisher relation and residuals. The top panel shows the flat rotation velocity of galaxies in the SPARC database (Lelli et al., 2016a) as a function of the baryonic mass (stars plus gas). The sample is restricted to those objects for which both quantities are measured to better than 20% accuracy. The bottom panel shows velocity residuals around the solid line in the top panel as a function of the central surface density of the stellar disks. Variations in the stellar surface density predict variations in velocity along the dashed line. These would translate to shifts illustrated by the dotted lines in the top panel, with each dotted line representing a shift of a factor of ten in surface density. The predicted dependence on surface density is not observed (Courteau & Rix, 1999; McGaugh and de Blok, 1998a; Sprayberry et al., 1995; Zwaan et al., 1995).

Instead of the systematic variation of velocity with surface brightness expected at fixed mass, there was none. Indeed, there is no hint of a second parameter dependence. The relation is incredibly tight by the standards of extragalactic astronomy (Lelli et al., 2016b): baryonic mass and the flat rotation speed are practically interchangeable.

The above derivation is overly simplistic. The radius at which we should make a measurement is ill-defined, and the surface density is dynamical: it includes both stars and dark matter. Moreover, galaxies are not spherical cows: one needs to solve the Poisson equation for the observed disk geometry of LTGs, and account for the varying radial contributions of luminous and dark matter. While this can be made to sound intimidating, the numerical computations are straightforward and rigorous (e.g., Begeman et al., 1991; Casertano & Shostak, 1980; Lelli et al., 2016a). It still boils down to the same sort of relation (modulo geometrical factors of order unity), but with two mass distributions: one for the baryons Mb(R), and one for the dark matter MDM(R). Though the dark matter is more massive, it is also more extended. Consequently, both components can contribute non-negligibly to the rotation over the observed range of radii:

V2(R) = GM/R = G(Mb/R + MDM/R), (equation 4)

(4)where for clarity we have omitted* geometrical factors. The only absolute requirement is that the baryonic contribution should begin to decline once the majority of baryonic mass is encompassed. It is when rotation curves persist in remaining flat past this point that we infer the need for dark matter.

A recurrent problem in testing galaxy formation theories is that they seldom make ironclad predictions; I attempt a brief summary in Table 1. SH represents a broad class of theories with many variants. By construction, the dark matter halos of galaxies of similar stellar mass are similar. If we associate the flat rotation velocity with halo mass, then galaxies of the same mass have the same circular velocity, and the problem posed by Tully-Fisher is automatically satisfied.

Table 1. Predictions of DD and SH for LSB galaxies.

ObservationDDSH
Evolutionary rate++
Size distribution++
Clustering+X
Tully-Fisher relationX?
Central density relation+X

While it is common to associate the flat rotation speed with the dark matter halo, this is a half-truth: the observed velocity is a combination of baryonic and dark components (eq. (4)). It is thus a rather curious coincidence that rotation curves are as flat as they are: the Keplerian decline of the baryonic contribution must be precisely balanced by an increasing contribution from the dark matter halo. This fine-tuning problem was dubbed the “disk-halo conspiracy” (Bahcall & Casertano, 1985; van Albada & Sancisi, 1986). The solution offered for the disk-halo conspiracy was that the formation of the baryonic disk has an effect on the distribution of the dark matter. As the disk settles, the dark matter halo respond through a process commonly referred to as adiabatic compression that brings the peak velocities of disk and dark components into alignment (Blumenthal et al., 1986). Some rearrangement of the dark matter halo in response to the change of the gravitational potential caused by the settling of the disk is inevitable, so this seemed a plausible explanation.

The observation that LSB galaxies obey the Tully-Fisher relation greatly compounds the fine-tuning (McGaugh and de Blok, 1998a; Zwaan et al., 1995). The amount of adiabatic compression depends on the surface density of stars (Sellwood and McGaugh, 2005b): HSB galaxies experience greater compression than LSB galaxies. This should enhance the predicted shift between the two in Tully-Fisher. Instead, the amplitude of the flat rotation speed remains unperturbed.

The generic failings of dark matter models was discussed at length by McGaugh and de Blok ​(1998a). The same problems have been encountered by others. For example, Fig. 5 shows model galaxies formed in a dark matter halo with identical total mass and density profile but with different spin parameters (van den Bosch, ​2001b). Variations in the assembly and cooling history were also considered, but these make little difference and are not relevant here. The point is that smaller (larger) spin parameters lead to more (less) compact disks that contribute more (less) to the total rotation, exactly as anticipated from variations in the term Mb/R in equation (4). The nominal variation is readily detectable, and stands out prominently in the Tully-Fisher diagram (Fig. 5). This is exactly the same fine-tuning problem that was pointed out by Zwaan et al. ​(1995) and McGaugh and de Blok ​(1998a).

What I describe as a fine-tuning problem is not portrayed as such by van den Bosch (2000) and van den Bosch and Dalcanton (2000), who argued that the data could be readily accommodated in the dark matter picture. The difference is between accommodating the data once known, and predicting it a priori. The dark matter picture is extraordinarily flexible: one is free to distribute the dark matter as needed to fit any data that evinces a non-negative mass discrepancy, even data that are wrong (de Blok & McGaugh, 1998). It is another matter entirely to construct a realistic model a priori; in my experience it is quite easy to construct models with plausible-seeming parameters that bear little resemblance to real galaxies (e.g., the low-spin case in Fig. 5). A similar conundrum is encountered when constructing models that can explain the long tidal tails observed in merging and interacting galaxies: models with realistic rotation curves do not produce realistic tidal tails, and vice-versa (Dubinski et al., 1999). The data occupy a very narrow sliver of the enormous volume of parameter space available to dark matter models, a situation that seems rather contrived.

Fig. 5. Model galaxy rotation curves and the Tully-Fisher relation. Rotation curves (left panel) for model galaxies of the same mass but different spin parameters λ from van den Bosch (2001b, see his Fig. 3). Models with lower spin have more compact stellar disks that contribute more to the rotation curve (V2 ​= ​GM/R; R being smaller for the same M). These models are shown as square points on the Baryonic Tully-Fisher relation (right) along with data for real galaxies (grey circles: Lelli et al., 2016b) and a fit thereto (dashed line). Differences in the cooling history result in modest variation in the baryonic mass at fixed halo mass as reflected in the vertical scatter of the models. This is within the scatter of the data, but variation due to the spin parameter is not.

Both DD and SH predict residuals from Tully-Fisher that are not observed. I consider this to be an unrecoverable failure for DD, which was my hypothesis (McGaugh, 1992), so I worked hard to salvage it. I could not. For SH, Tully-Fisher might be recovered in the limit of dark matter domination, which requires further consideration.


I will save the further consideration for a future post, as that can take infinite words (there are literally thousands of ApJ papers on the subject). The real problem that rotation curve data pose generically for the dark matter interpretation is the fine-tuning required between baryonic and dark matter components – the balancing act explicit in the equations above. This, by itself, constitutes a practical falsification of the dark matter paradigm.

Without going into interesting but ultimately meaningless details (maybe next time), the only way to avoid this conclusion is to choose to be unconcerned with fine-tuning. If you choose to say fine-tuning isn’t a problem, then it isn’t a problem. Worse, many scientists don’t seem to understand that they’ve even made this choice: it is baked into their assumptions. There is no risk of questioning those assumptions if one never stops to think about them, much less worry that there might be something wrong with them.

Much of the field seems to have sunk into a form of scientific nihilism. The attitude I frequently encounter when I raise this issue boils down to “Don’t care! Everything will magically work out! LA LA LA!”


*Strictly speaking, eq. (4) only holds for spherical mass distributions. I make this simplification here to emphasize the fact that both mass and radius matter. This essential scaling persists for any geometry: the argument holds in complete generality.

Galaxy Formation – a few basics

Galaxy Formation – a few basics

Galaxies are gravitationally bound condensations of stars and gas in a mostly empty, expanding universe. The tens of billions of solar masses of baryonic material that comprise the stars and gas of the Milky Way now reside mostly within a radius of 20 kpc. At the average density of the universe, the equivalent mass fills a spherical volume with a comoving radius a bit in excess of 1 Mpc. This is a large factor by which a protogalaxy must collapse, starting from the very smooth (~ 1 part in 105) initial condition at z ​= ​1090 observed in the CMB (Planck Collaboration et al., 2018). Dark matter — in particular, non-baryonic cold dark matter — plays an essential role in speeding this process along.

The mass-energy of the early universe is initially dominated by the radiation field. The baryons are held in thrall to the photons until the expansion of the universe turns the tables and matter becomes dominant. Exactly when this happens depends on the mass density (Peebles, 1980); for our purposes it suffices to realize that the baryonic components of galaxies can not begin to form until well after the time of the CMB. However, since CDM does not interact with photons, it is not subject to this limitation. The dark matter can begin to form structures — dark matter halos — that form the scaffolding of future structure. Essential to the ΛCDM galaxy formation paradigm is that the dark matter halos form first, seeding the subsequent formation of luminous galaxies by providing the potential wells into which baryons can condense once free from the radiation field.

The theoretical expectation for how dark matter halos form is well understood at this juncture. Numerical simulations of cold dark matter — mass that interacts only through gravity in an expanding universe — show that quasi-spherical dark matter halos form with a characteristic ‘NFW’ (e.g., Navarro et al., 1997) density profile. These have a ‘cuspy’ inner density profile in which the density of dark matter increases towards the center approximately 1 as a power law, ρ(r → 0) ~ r−1. At larger radii, the density profile falls of as ρ(r) ~ r−3. The centers of these halos are the density peaks around which galaxies can form.

The galaxies that we observe are composed of stars and gas: normal baryonic matter. The theoretical expectation for how baryons behave during galaxy formation is not well understood (Scannapieco et al., 2012). This results in a tremendous and long-standing disconnect between theory and observation. We can, however, stipulate a few requirements as to what needs to happen. Dark matter halos must form first; the baryons fall into these halos afterwards. Dark matter halos are observed to extend well beyond the outer edges of visible galaxies, so baryons must condense to the centers of dark matter halos. This condensation may proceed through both the hierarchical merging of protogalactic fragments (a process that has a proclivity to form ETGs) and the more gentle accretion of gas into rotating disks (a requirement to form LTGs). In either case, some fraction of the baryons form the observed, luminous component of a galaxy at the center of a CDM halo. This condensation of baryons necessarily affects the dark matter gravitationally, with the net effect of dragging some of it towards the center (Blumenthal et al., 1986; Dubinski, 1994; Gnedin et al., 2004; Sellwood and McGaugh, 2005a), thus compressing the dark matter halo from its initial condition as indicated by dark matter-only simulations like those of Navarro et al. (1997). These processes must all occur, but do not by themselves suffice to explain real galaxies.

Galaxies formed in models that consider only the inevitable effects described above suffer many serious defects. They tend to be too massive (Abadi et al., 2003; Benson et al., 2003), too small (the angular momentum catastrophe: Katz, 1992; Steinmetz, 1999; D’Onghia et al., 2006), have systematically too large bulge-to-disk ratios (the bulgeless galaxy problem: D’Onghia and Burkert, 2004; Kormendy et al., 2010), have dark matter halos with too much mass at small radii (the cusp-core problem: Moore et al., 1999b; Kuzio de Naray et al., 2008, 2009; de Blok, 2010; Kuzio de Naray and McGaugh, 2014), and have the wrong over-all mass function (the over-cooling problem, e.g., Benson, 2010), also known locally as the missing satellite problem (Klypin et al., 1999; Moore et al., 1999a). This long list of problems have kept the field of galaxy formation a lively one: there is no risk of it becoming a victim its own success through the appearance of one clearly-correct standard model.

Historical threads of development

Like last time, this is a minimalist outline of the basics that are relevant to our discussion. A proper history of this field would be much longer. Indeed, I rather doubt it would be possible to write a coherent text on the subject, which means different things to different scientists.

Entering the 1980s, options for galaxy formation were frequently portrayed as a dichotomy between monolithic galaxy formation (Eggen et al., 1962) and the merger of protogalactic fragments (Searle and Zinn, 1978). The basic idea of monolithic galaxy formation is that the initial ~ 1 Mpc cloud of gas that would form the Milky Way experienced dissipational collapse in one smooth, adiabatic process. This is effective at forming the disk, with only a tiny bit of star formation occurring during the collapse phase to provide the stars of the ancient, metal-poor stellar halo. In contrast, the Galaxy could have been built up by the merger of smaller protogalactic fragments, each with their own life as smaller galaxies prior to merging. The latter is more natural to the emergence of structure from the initial conditions observed in the CMB, where small lumps condense more readily than large ones. Indeed, this effectively forms the basis of the modern picture of hierarchical galaxy formation (Efstathiou et al., 1988).

Hierarchical galaxy formation is effective at forming bulges and pressure-supported ETGs, but is anathema to the formation of orderly disks. Dynamically cold disks are fragile and prefer to be left alone: the high rate of merging in the hierarchical ΛCDM model tends to destroy the dynamically cold state in which most spirals are observed to exist (Abadi et al., 2003; Peebles, 2020; Toth and Ostriker, 1992). Consequently, there have been some rather different ideas about galaxy formation: if one starts from the initial conditions imposed by the CMB, hierarchical galaxy formation is inevitable. If instead one works backwards from the observed state of galaxy disks, the smooth settling of gaseous disks in relatively isolated monoliths seems more plausible.

In addition to different theoretical notions, our picture of the galaxy population was woefully incomplete. An influential study by Freeman (1970) found that 28 of three dozen spirals shared very nearly the same central surface brightness. This was generalized into a belief that all spirals had the same (high) surface brightness, and came to be known as Freeman’s Law. Ultimately this proved to be a selection effect, as pointed out early by Disney (1976) and Allen and Shu (1979). However, it was not until much later (McGaugh et al., 1995a) that this became widely recognized. In the mean time, the prevailing assumption was that Freeman’s Law held true (e.g., van der Kruit, 1987) and all spirals had practically the same surface brightness. In particular, it was the central surface brightness of the disk component of spiral galaxies that was thought to be universal, while bulges and ETGs varied in surface brightness. Variation in the disk component of LTGs was thought to be restricted to variations in size, which led to variations in luminosity at fixed surface brightness.

Consequently, most theoretical effort was concentrated on the bright objects in the high-mass (M ​> ​1010 ​M) clump in Fig. 2. Some low mass dwarf galaxies were known to exist, but were considered to be insignificant because they contained little mass. Low surface brightness galaxies violated Freeman’s Law, so were widely presumed not to exist, or be at most a rare curiosity (Bosma & Freeman, 1993). A happy consequence of this unfortunate state of affairs was that as observations of diffuse LSB galaxies were made, they forced then-current ideas about galaxy formation into a regime that they had not anticipated, and which many could not accommodate.

The similarity and difference between high surface brightness (HSB) and LSB galaxies is illustrated by Fig. 3. Both are rotationally supported, late type disk galaxies. Both show spiral structure, though it is more prominent in the HSB. More importantly, both systems are of comparable linear diameter. They exist roughly at opposite ends of a horizontal line in Fig. 2. Their differing stellar masses stem from the surface density of their stars rather than their linear extent — exactly the opposite of what had been inferred from Freeman’s Law. Any model of galaxy formation and evolution must account for the distribution of size (or surface brightness) at a given mass as well as the number density of galaxies as a function of mass. Both aspects of the galaxy population remain problematic to this day.

Fig. 3. High and low surface brightness galaxies. NGC 7757 (left) and UGC 1230 (right) are examples of high and low surface brightness galaxies, respectively. These galaxies are about the same distance away and span roughly the same physical diameter. The chief difference is in the surface brightness, which follows from the separation between stars (McGaugh et al., 1995b). Note that the intensity scale of these images is not identical; the contrast has been increased for the LSB galaxy so that it appears as more than a smudge.

Throughout my thesis work, my spouse joked that my LSB galaxy images looked like bug splots on the telescope. You can see more of them here. And a few more here. And lots more on Jim Schombert’s web pages, here and here and here.

Primer on Galaxy Properties

Primer on Galaxy Properties

When we look up at the sky, we see stars. Stars are the building blocks of galaxies; we can see the stellar disk of the galaxy in which we live as the vault of the Milky Way arching across the sky. When we look beyond the Milky Way, we see galaxies. Just as stars are the building blocks of galaxies, galaxies are the building blocks of the universe. One can no more hope to understand cosmology without understanding galaxies than one can hope to understand galaxies without understanding stars.

Here I give a very brief primer on basic galaxy properties. This is a subject on which entire textbooks are written, so what I say here is necessarily very incomplete. It is a bare minimum to go on for the ensuing discussion.

Galaxy Properties

Cosmology entered the modern era when Hubble (1929) resolved the debate over the nature of spiral nebulae by measuring the distance to Andromeda, establishing that vast stellar systems — galaxies — exist external to and coequal with the Milky Way. Galaxies are the primary type of object observed when we look beyond the confines of our own Milky Way: they are the building blocks of the universe. Consequently, galaxies and cosmology are intertwined: it is impossible to understand one without the other.

Here I sketch a few essential facts about the properties of galaxies. This is far from a comprehensive list (see, for example Binney & Tremaine, 1987) and serves only to provide a minimum framework for the subsequent discussion. The properties of galaxies are often cast in terms of morphological type, starting with Hubble’s tuning fork diagram. The primary distinction is between Early Type Galaxies (ETGs) and Late Type Galaxies (LTGs), which is a matter of basic structure. ETGs, also known as elliptical galaxies, are three dimensional, ellipsoidal systems that are pressure supported: there is more kinetic energy in random motions than in circular motions, a condition described as dynamically hot. The orbits of stars are generally eccentric and oriented randomly with respect to one another, filling out the ellipsoidal shape seen in projection on the sky. LTGs, including spiral and irregular galaxies, are thin, quasi-two dimensional, rotationally supported disks. The majority of their stars orbit in the same plane in the same direction on low eccentricity orbits. The lion’s share of kinetic energy is invested in circular motion, with only small random motions, a condition described as dynamically cold. Examples of early and late type galaxies are shown in Fig. 1.

Fig. 1. Galaxy morphology. These examples shown an early type elliptical galaxy (NGC 3379, left), and two late type disk galaxies: a face-on spiral (NGC 628, top right), and an edge-on disk galaxy (NGC 891, bottom right). Elliptical galaxies are quasi-spherical, pressure supported stellar systems that tend to have predominantly old stellar populations, usually lacking young stars or much in the way of the cold interstellar gas from which they might form. In contrast, late type galaxies (spirals and irregulars) are thin, rotationally supported disks. They typically contain a mix of stellar ages and cold interstellar gas from which new stars continue to form. Interstellar dust is also present, being most obvious in the edge-on case. Images from Palomar Observatory, Caltech.

Finer distinctions in morphology can be made within the broad classes of early and late type galaxies, but the basic structural and kinematic differences suffice here. The disordered motion of ETGs is a natural consequence of violent relaxation (Lynden-Bell, 1967) in which a stellar system reaches a state of dynamical equilibrium from a chaotic initial state. This can proceed relatively quickly from a number of conceivable initial conditions, and is a rather natural consequence of the hierarchical merging of sub-clumps expected from the Gaussian initial conditions indicated by observations of the CMB (White, 1996). In contrast, the orderly rotation of dynamically cold LTGs requires a gentle settling of gas into a rotationally supported disk. It is essential that disk formation occur in the gaseous phase, as gas can dissipate and settle to the preferred plane specified by the net angular momentum of the system. Once stars form, their orbits retain a memory of their initial state for a period typically much greater than the age of the universe (Binney & Tremaine, 1987). Consequently, the bulk of the stars in the spiral disk must have formed there after the gas settled.

In addition to the dichotomy in structure, ETGs and LTGs also differ in their evolutionary history. ETGs tend to be ‘red and dead,’ which is to say, dominated by old stars. They typically lack much in the way of recent star formation, and are often devoid of the cold interstellar gas from which new stars can form. Most of their star formation happened in the early universe, and may have involved the merger of multiple protogalactic fragments. Irrespective of these details, massive ETGs appeared early in the universe (Steinhardt et al., 2016), and for the most part seem to have evolved passively since (Franck and McGaugh, 2017).

Again in contrast, LTGs have on-going star formation in interstellar media replete with cold atomic and molecular gas. They exhibit a wide range in stellar ages, from newly formed stars to ancient stars dating to near the beginning of time. Old stars seem to be omnipresent, famously occupying globular clusters but also present in the general disk population. This implies that the gaseous disk settled fairly early, though accretion may continue over a long timescale (van den Bergh, 1962; Henry and Worthey, 1999). Old stars persist in the same orbital plane as young stars (Binney & Merrifield, 1998), which precludes much subsequent merger activity, as the chaos of merging distorts orbits. Disks can be over-heated (Toth and Ostriker, 1992) and transformed by interactions between galaxies (Toomre and Toomre, 1972), even turning into elliptical galaxies during major mergers (Barnes & Hernquist, 1992).

Aside from its morphology, an obvious property of a galaxy is its mass. Galaxies exist over a large range of mass, with a type-dependent characteristic stellar mass of 5 ​× ​1010 ​M for disk dominated systems (the Milky Way is very close to this mass: Bland-Hawthorn & Gerhard, 2016) and 1011 ​M for elliptical galaxies (Moffett et al., 2016). Above this characteristic mass, the number density of galaxies declines sharply, though individual galaxies exceeding a few 1011 ​M certainly exist. The number density of galaxies increases gradually to lower masses, with no known minimum. The gradual increase in numbers does not compensate for the decrease in mass: integrating over the distribution, one finds that most of the stellar mass is in bright galaxies close to the characteristic mass.

Galaxies have a characteristic size and surface brightness. The same amount of stellar mass can be concentrated in a high surface brightness (HSB) galaxies, or spread over a much larger area in a low surface brightness (LSB) galaxy. For the purposes of this discussion, it suffices to assume that the observed luminosity is proportional to the mass of stars that produces the light. Similarly, the surface brightness measures the surface density of stars. Of the three observable quantities of luminosity, size, and surface brightness, only two are independent: the luminosity is the product of the surface brightness and the area over which it extends. The area scales as the square of the linear size.

The distribution of size and mass of galaxies is shown in Fig. 2. This figure spans the range from tiny dwarf irregular galaxies containing ‘only’ a few hundred thousand stars to giant spirals composed of hundreds of billions of stars with half-light radii ranging from hundreds of parsecs to tens of kpc. The upper boundaries represent real, physical limits on the sizes and masses of galaxies. Bright objects are easy to see; if still higher mass galaxies were common, they would be readily detected and cataloged. In contrast, the lower boundaries are set by the limits of observational sensitivity (“selection effects”): galaxies that are physically small or low in surface brightness are difficult to detect and are systematically under-represented in galaxy catalogs (Allen & Shu, 1979; Disney, 1976; McGaugh et al., 1995a).

Fig. 2. Galaxy size and mass. The radius that contains half of the light is plotted against the stellar mass. Galaxies exist over many decades in mass, and exhibit a considerable variation in size at a given mass. Early and late type galaxies are demarcated with different symbols, as noted. Lines illustrate tracks of constant stellar surface density. The data for ETGs are from the compilation of Dabringhausen and Fellhauer (2016) augmented by dwarf Spheroidal (dSph) galaxies in the Local Group compiled by Lelli et al. (2017). Ultra-diffuse galaxies (UDGs: van Dokkum et al., 2015; Mihos et al., 2015, ​× ​and +, respectively) have unsettled kinematic classifications at present, but most seem likely to be pressure supported ETGs. The bulk of the data for LTGs is from the SPARC database (Lelli et al., 2016a), augmented by cases that are noteworthy for their extremity in mass or surface brightness (Brunker et al., 2019; Dalcanton, Spergel, Gunn, et al., 1997; de Blok et al., 1995; McGaugh and Bothun, 1994; Mihos et al., 2018; Rhode et al., 2013; Schombert et al., 2011). The gas content of these star-forming systems adds a third axis, illustrated crudely here by whether an LTG is made more of stars or gas (filled and open symbols, respectively).

Individual galaxies can be early type or late type, high mass or low mass, large or small in linear extent, high or low surface brightness, gas poor or gas rich. No one of these properties is completely predictive of the others: the correlations that do exist tend to have lots of intrinsic scatter. The primary exception to this appears to involve the kinematics. Massive galaxies are fast rotators; low mass galaxies are slow rotators. This Tully-Fisher relation (Tully and Fisher, 1977) is one of the strongest correlations in extragalactic astronomy (Lelli et al., 2016b). It is thus necessary to simultaneously explain both the chaotic diversity of galaxy properties and the orderly nature of their kinematics (McGaugh et al., 2019).

Galaxies do not exist in isolation. Rather than being randomly distributed throughout the universe, they tend to cluster together: the best place to find a galaxy is in the proximity of another galaxy (Rubin, 1954). A common way to quantify the clustering of galaxies is the two-point correlation function ξ(r) (Peebles, 1980). This measures the excess probability of finding a galaxy within a distance r of a reference galaxy relative to a random distribution. The observed correlation function is well approximated as a power law whose slope and normalization varies with galaxy population. ETGs are more clustered than LTGs, having a longer correlation length: r0 ​≈ ​9 Mpc for red galaxies vs. ~ 5 Mpc for blue galaxies (Zehavi et al., 2011). Here we will find this quantity to be of interest for comparing the distribution of high and low surface brightness galaxies.


Galaxies are sometimes called island universes. That is partly a hangover from pre-Hubble times during which it was widely believed that the Milky Way contained everything: it was one giant island universe embedded in an indefinite but otherwise empty void. We know that’s not true now – there are lots of stellar systems of similar size to the Milky Way – but they often seem to stand alone even if they are clustered in non-random ways.

For example, here is the spiral galaxy NGC 7757, an island unto itself.

NGC 7757 from the digitized sky survey (© 1994, Association of Universities for Research in Astronomy, Inc).

NGC 7757 is a high surface brightness spiral. It is easy to spot amongst the foreground stars of the Milky Way. In contrast, there are strong selection effects against low surface brightness galaxies, like UGC 1230:

UGC 1230 from the digitized sky survey (© 1994, Association of Universities for Research in Astronomy, Inc).

The LSB galaxy is rather harder to spot. Even when noticed, it doesn’t seem as important as the HSB galaxy. This, in a nutshell, is the history of selection effects in galaxy surveys, which are inevitably biased towards the biggest and the brightest. Advances in detectors (especially the CCD revolution of the 1980s) helped open our eyes to the existence of these LSB galaxies, and allowed us to measure their physical properties. Doing so provided a stringent test of galaxy formation theories, which have scrambled to catch up ever since.