Progressive Approximations in Mass Modeling

Progressive Approximations in Mass Modeling

I have said I wasn’t going to attempt to teach an entire graduate course on galaxy dynamics in this forum, and I’m not. But I can give some pointers for those who want to try it for themselves. It also provides some useful context for fans of Deur’s approach.

The go-to textbook for this topic is Galactic Dynamics by Binney & Tremaine. The first edition was published in 1987, conveniently when I switched to grad school in astronomy. It was already a deep and well-developed field at that time; this is a compendium of considerable scientific knowledge.

Fun story: a colleague in a joint physics & astronomy department once complained to me that she wanted to develop a course in galaxy dynamics, which is a staple of graduate programs in astronomy & astrophysics. However, there was a certain senior colleague who objected, saying that since it was astronomy, it couldn’t possibly be a rigorous course worthy of a full semester graduate course. This is a casual bias that astronomers often encounter when talking to physicists, many of whom have attitudes about the subject that were trapped in amber sometime in the Jurassic. I suggested that she walk into his office and drop a copy of Galactic Dynamics on his desk from on high, as (1) it would make a hefty impact, and (2) no one who so much as skims this book could persist in this toxic attitude.

She later reported that she had done this, and it had worked.

Galactic Dynamics is not a starter book. It is the textbook we use when teaching the graduate course that this is not. A useful how-to guide for the specific material I’ll discuss here is provided by Federico Lelli. In brief, to model the gravitational potential of an observed distribution of matter, we can make one of the following series of approximations:

This is a slide I sometimes use to introduce mass modeling in science talks as a reminder for expert audiences.

All science is an approximation at some level. The most crude approximation we can employ here is to imagine that all of the mass resides at a central point. In this limit, the potential is simply

V2 = GM/R

where V is the orbital speed of a test particle on a circular orbit, G is Newton’s constant, M is the mass, and R is the distance from the point mass. Galaxies are not point masses, so this is a terrible approximation, as can be seen by the divergent V ~ R-1/2 behavior as R → 0 (the dotted line above).

The next bad approximation one can make is a spherical cow: assume the mass is distributed in a sphere that is projected as the image we see on the sky. This at least incorporates the fact that the mass is not all concentrated at a point, so

V2 = GM(R)/R

acknowledges that the mass M is spread out as a function of radius. This is a spherical cow. Since we cannot see dark matter, we almost always assume it to be a spherical cow.

For the luminous disk of a spiral galaxy, a common approximation is the so-called exponential disk:

Σ(R) = Σ0 e-R/Rd

where Σ0 is the central surface density of stars and Rd is the scale length of the disk – the characteristic size over which the surface brightness declines exponentially. This can be integrated by parts to obtain an expression for the enclosed mass M(R) which I leave as an exercise for the eager reader. This provides a handy analytic formula, the rotation curve of which is illustrated above by the dashed line.

Spiral galaxies are fairly thin when seen edge-on, so the spherical cow is not a great approximation. In a classic paper, Freeman (1970) solved the Poisson equation for the case of a razor-thin exponential disk, where one meets modified Bessel functions of the first and second kind (denoted “ikik” above). These must be solved numerically, but one can make a tabulation for use with any choice of disk mass and scale length. Such a thin disk is illustrated by the grey line above for a choice of stellar mass and scale length appropriate to NGC 6946.

The spiral galaxy NGC 6946, aka the fireworks galaxy.

Spiral galaxies are not razor thin of course. We only see a projected image on the sky, so for a galaxy like NGC 6946, we may have a good measurement of its azimuthally averaged light (and presumable stellar mass) distribution Σ(R) but we have no idea how thick it is. Here, we have to make an educated guess based on observations of edge-on galaxies. A ballpark average is R:z = 8:1, but some galaxies are thicker and others thinner, so this becomes an approximation with an associated uncertainty. This uncertainty cannot be unambiguously eliminated; it is one of the known unknowns that comprise the inevitable systematic errors in astronomy. Fortunately, allowing for a finite thickness only takes the harsh edge off of the thin disk case, and the assumption one chooses makes little difference to the result (compare the lines labeled thick and thin above).

The exponential disk formula Σ(R) is an azimuthal average over an image like that of NGC 6946. This approximation captures none of the spiral structure: it only tells us about the average rate at which the surface brightness falls off. It also imposes a smooth shape on that fall off that our eyes can see is not necessarily a great approximation. So the next level of approximation is to solve the Poisson equation numerically for the observed surface brightness profile, Σ(R), not just the exponential approximation thereto. This is the blue line in the bottom right graph above.

There are important differences between using the numerical solution for the observed light distribution and the exponential disk approximation. This has been known since the 1980s, but the analytic expression is so convenient that people need an occasional reminder not to trust it too much. Jerry Sellwood felt the need to provide this reminder in 1999:

Small apparent differences in the shape of the mass profile (left) correspond to pronounced differences in the rotation curve (right). I chose the example of NGC 6946 in part because the exponential approximation for it is pretty good. Nevertheless, the details matter, so the best practice is to build numerical mass models, as we did for SPARC.

Building numerical mass models is tractable for external galaxies, where we can see the entire light distribution. It is not possible for our own Milky Way, since we are located within it and cannot see it as a whole. Consequently, the vast majority of Milky Way models rely on the exponential approximation; so far as I’m aware, I’m the only one who has built a model that attempts to get beyond this.

Numerical mass models are still an approximation. We’re assuming that the gravitational potential is static and azimuthally symmetric. Taking the next step would require abandoning these assumptions to model the spiral arms. The Poisson equation can handle that, but it becomes dicey because the arms rotate with some pattern speed (generally unknown) and may grow or dissolve or reform on some unknown timescale. The potential at any given point is time variable even in equilibrium, so we need not just a numerical solution but a live numerical simulation to keep track of it. That can be done, but it has to be done on a case by case basis, and the answer will depend somewhat on additional assumptions that have to be introduced to run the simulation, like specifying a dark matter halo.

One can generalize further to consider the full 3D potential, e.g., to allow for asymmetry in the z-direction as well as in azimuth. One can further imagine non-equilibrium processes, such as an external perturbations. There is good evidence that the Milky Way suffers both of these effects, the passage of the Large Magellanic Cloud being one obvious and apparently large perturbation. So we are in the awkward position that the Gaia data now oblige us to consider the entire run of possible effects through non-equilibrium processes in a mass distribution that is not completely symmetric in any of the three spatial dimensions, but for the main mass component we are stuck with the inadequate approximation of an exponential disk.

Geometry appears to play a crucial role in the approach of Deur to the acceleration discrepancy problem. The essential claim is that the discrepancy correlates with flattening, with highly flattened systems like spirals evincing the classic discrepancy while spherical systems like E0 galaxies showing none. Big if true!

A useful plot appears on slide 44:

Some measure of the discrepancy as a function of apparent ellipticity.

This is the one example shown that goes into the plot of many determinations of the slope a on the following slide. It being the only one, it is the only thing I have to evaluate without chasing down every other case. Looking at this, I am not inclined to do so.

At first it looks persuasive: the best fit slope is clear. There is no reason why the discrepancy should depend on the projected ellipticity of a triaxial 3D blob of stars, so this must be telling us something important. I’d be on board with that if it were true, but I’ve seen too many non-correlations masquerading as correlations to believe this one. The fitted slope is strongly influenced by the one point at large ellipticity; absent that, a slope of zero works fine. Mostly what I see here is a lot of scatter, which is normal in extragalactic astronomy. Since there are only a few points at high and low ellipticity, we don’t know what would happen if we went out and got more data. But I bet that what would happen is that the high ellipticity points would wind up looking like those in the middle: a big blob of scatter, with no significant correlation.

I’d kinda like to be wrong about this one, so I won’t even get into the theory side, which I find sorta compelling but ultimately unpersuasive. Why are gravitons confined to a disk? What happens way far out? Surely the flatness of the disk at tens of kpc is not dictating the flatness at 1000 kpc.

Surely.

Why’d it have to be MOND?

Why’d it have to be MOND?

I want to take another step back in perspective from the last post to say a few words about what the radial acceleration relation (RAR) means and what it doesn’t mean. Here it is again:

The Radial Acceleration Relation over many decades. The grey region is forbidden – there cannot be less acceleration than caused by the observed baryons. The entire region above the diagonal line (yellow) is accessible to dark matter models as the sum of baryons and however much dark matter the model prescribes. MOND is the blue line.

This information was not available when the dark matter paradigm was developed. We observed excess motion, like flat rotation curves, and inferred the existence of extra mass. That was perfectly reasonable given the information available at the time. It is not now: we need to reassess as we learn more.

There is a clear organization to the data at both high and low acceleration. No objective observer with a well-developed physical intuition would look at this and think “dark matter.” The observed behavior does not follow from one force law plus some arbitrary amount of invisible mass. That could do literally anything in the yellow region above, and beyond the bounds of the plot, both upwards and to the left. Indeed, there is no obvious reason why the data don’t fall all over the place. One of the lingering, niggling concerns is the 5:1 ratio of dark matter:baryons – why is it in the same ballpark, when it could be pretty much anything? Why should the data organize in terms of acceleration? There is no reason for dark matter to do this.

Plausible dark matter models have been predicted to do a variety of things – things other than what we observe. The problem for dark matter is that real objects only occupy a tiny line through the vast region available to them in the plot above. This is a fine-tuning problem: why do the data reside only where they do when they could be all over the place? I recognized this as a problem for dark matter before I became aware$ of MOND. That it turns out that the data follow the line uniquely predicted* by MOND is just chef’s kiss: there is a fine-tuning problem for dark matter because MOND is the effective force law.

The argument against dark matter is that the data could reside anywhere in the yellow region above, but don’t. The argument against MOND is that a small portion of the data fall a little off the blue line. Arguing that such objects, be they clusters of galaxies or particular individual galaxies, falsify MOND while ignoring the fine-tuning problem faced by dark matter is a case of refusing to see the forest for a few outlying trees.%

So to return to the question posed in the title of this post, I don’t know why it had to be MOND. That’s just what we observe. Pretending dark matter does the same thing is a false presumption.


$I’d heard of MOND only vaguely, and, like most other scientists in the field, had paid it no mind until it reared its ugly head in my own data.

*I talk about MOND here because I believe in giving credit where credit is due. MOND predicted this; no other theory did so. Dark matter theories did not predict this. My dark matter-based galaxy formation theory did not predict this. Other dark matter-based galaxy formation theories (including simulations) continue to fail to explain this. Other hypotheses of modified gravity also did not predict what is observed. Who+ ordered this?

Modified Dynamics. Very dangerous. You go first.

Many people in the field hate MOND, often with an irrational intensity that has the texture of religion. It’s not as if I woke up one morning and decided to like MOND – sometimes I wish I had never heard of it – but disliking a theory doesn’t make it wrong, and ignoring it doesn’t make it go away. MOND and only MOND predicted the observed RAR a priori. So far, MOND and only MOND provides a satisfactory explanation of thereof. We might not like it, but there it is in the data. We’re not going to progress until we get over our fear of MOND and cope with it. Imagining that it will somehow fall out of simulations with just the right baryonic feedback prescription is a form of magical thinking, not science.

MOND. Why’d it have to be MOND?

+Milgrom. Milgrom ordered this.


%I expect many cosmologists would argue the same in reverse for the cosmic microwave background (CMB) and other cosmological constraints. I have some sympathy for this. The fit to the power spectrum of the CMB seems too good to be an accident, and it points to the same parameters as other constraints. Well, mostly – the Hubble tension might be a clue that things could unravel, as if they haven’t already. The situation is not symmetric – where MOND predicted what we observe a priori with a minimum of assumptions, LCDM is an amalgam of one free parameter after another after another: dark matter and dark energy are, after all, auxiliary hypotheses we invented to save FLRW cosmology. When they don’t suffice, we invent more. Feedback is single word that represents a whole Pandora’s box of extra degrees of freedom, and we can invent crazier things as needed. The results is a Frankenstein’s monster of a cosmology that we all agree is the same entity, but when we examine it closely the pieces don’t fit, and one cosmologist’s LCDM is not really the same as that of the next. They just seem to agree because they use the same words to mean somewhat different things. Simply agreeing that there has to be non-baryonic dark matter has not helped us conjure up detections of the dark matter particles in the laboratory, or given us the clairvoyance to explain# what MOND predicted a prioi. So rather than agree that dark matter must exist because cosmology works so well, I think the appearance of working well is a chimera of many moving parts. Rather, cosmology, as we currently understand it, works if and only if non-baryonic dark matter exists in the right amount. That requires a laboratory detection to confirm.

#I have a disturbing lack of faith that a satisfactory explanation can be found.

The Radial Acceleration Relation starting from high accelerations

The Radial Acceleration Relation starting from high accelerations

In the previous post, we discussed how lensing data extend the Radial Acceleration Relation (RAR) seen in galaxy kinematics to very low accelerations. Let’s zoom out now, and look at things at higher accelerations and from a historical perspective.

This all started with Kepler’s Laws of Planetary Motion, which are explained by Newton’s Universal Gravitation – the inverse square law gbar = GM/r2 is exactly what is needed to explain the observed centripetal acceleration, gobs = V2/r. It also explains the surface gravity of the Earth. Indeed, it was the famous falling apple that is reputed to have given Newton the epiphany that it was the same force that made the apple fall to the ground that made the Moon circle the Earth that made the planets revolve around the sun.

The inverse square law holds over more than six decades of observed acceleration in the solar system, from the one gee we feel here on the surface of the Earth to the outskirts patrolled by Neptune.

Planetary motion in the radial acceleration plane. The dotted line is Newton’s inverse square law of universal gravity.*

The inverse square force law is what it takes to make the planetary data line up. A different force law would give a line with a different slope in this plot. No force law at all would give chaos, with planets all over the place in this plot, if, say, the solar system were run by a series of deferents and epicycles as envisioned for Ptolemaic cosmologies. In such a system, there is no reason to expect the organization seen above. It would require considerable contrivance to make it so.

Newtonian gravity and General Relativity are exquisitely well-tested in the solar system. There are also some very precise tests at higher accelerations that GR passes with flying colors. The story to lower accelerations is another matter. The most remote solar system probes we’ve launched are the Voyger and Pioneer missions. These probe down to ~10-6 m/s/s; below that is uncharted territory.

The RAR extended from high solar system accelerations to much low accelerations typical of galaxies – not the change in scale. Some early rotation curves (of NGC 55, NGC 801, NGC 2403, NGC 2841, & UGC 2885) are shown as lines. These probed an entirely new regime of acceleration. The departure of these lines from the dotted line are the flat rotation curves indicating the acceleration discrepancy/need for dark matter. This discrepancy was clear by the end of the 1970s, but the amplitude of the discrepancy then was modest.

Galaxies (and extragalactic data in general) probe an acceleration range that is unprecedented from the perspective of solar system tests. General Relativity has passed so many precise tests that the usual presumption is that is applies at all scales. But it is an assumption that it applies to scales where it hasn’t been tested. Galaxies and cosmology pose such a test. That we need to invoke dark matter to save the phenomenon would be interpreted as a failure if we had set out to test the theory rather than assume it applied.

It was clear from flat rotation curves that something extra was needed. However, when we invented the dark matter paradigm, it was not clear that the data were organized in terms of acceleration. As the data continued to improve, it became clear that the vast majority of galaxies adhered to a single, apparently universal+ radial acceleration relation. What had been a hint of systematic behavior in early data became clean and clear. The data did not exhibit the scatter that as was expected from a sum of a baryonic disk and a non-baryonic dark matter halo – there is no reason that these two distinct components should sum to the single effective force law that is observed.

The RAR with modern data for both early (red triangles) and late (cyan circles) morphological types. The blue line is the prediction of MOND: there is a transition at an acceleration scale to a force law that is universal but no longer inverse-square.

The observed force-law happened to already have a name: MOND. If it had been something else, then we could have claimed to discover something new. But instead we were obliged to admit that the unexpected thing we had found had in fact been predicted by Milgrom.

This predictive power now extends to much lower accelerations. Again, only MOND got this prediction right in advance.

The RAR as above, extended by weak gravitational lensing observations. These follow the prediction of MOND as far as they are credible.

The data could have done many different things here. It could have continued along the dotted line, in which case we’d have need for no dark matter or modified gravity. It could have scattered all over the place – this is the natural expectation of dark matter theories, as there is no reason to expect the gravitational potential of the dominant dark matter halo to be dictated by the distribution of baryons. One expects that not to happen. Yet the data evince the exceptional degree of organization seen above.

It requires considerable contrivance to explain the RAR with dark matter. No viable explanation yet exists, despite many unconvincing claims to this effect. I have worked more on trying to explain this in terms of dark matter than I have on MOND, and all I can tell you is what doesn’t work. Every explanation I’ve seen so far is a special case of a model I had previously considered and rejected as obviously unworkable. At this point, I don’t see how dark matter can ever plausibly do what the data require.

I worry that dark matter has become an epicycle theory. We’re sure it is right, so whatever we observe, no matter how awkward or unexpected, must be what it does. But what if it is wrong, and it does not exist? How do we ever disabuse ourselves of the notion that there is invisible mass once we’ve convinced ourselves that there has to be?

Of course, MOND has its own problems. Clusters of galaxies are systems$ for which it persistently fails to explain the amplitude of the observed acceleration discrepancy. So let’s add those to the plot as well:

As above, with clusters of galaxies added (x: Sanders 2003; +: Li et al. 2023).

So: do clusters violate the RAR, or follow it? I’d say yes and yes – the offset, thought modest in amplitude in this depiction, is statistically significant. But there is also a similar scaling with acceleration, only the amplitude is off. The former makes no sense in MOND; the latter makes no sense in terms of dark matter which did not predict a RAR at all.

Clusters are the strongest evidence against MOND. Just being evidence against MOND doesn’t automatically make it evidence in favor of dark matter. I often pose myself the question: which theory requires me to disbelieve the least amount of data? When I first came to the problem, I was shocked to find that the answer was clearly MOND. Since then, it has gone back and forth, but rather than a clear answer emerging, what has happened is more a divergence of different lines of evidence: that which favors the standard cosmology is incommensurate with that which favors MOND. This leads to considerable cognitive dissonance.

One way to cope with cognitive dissonance is to engage with a problem from different perspectives. If I put on a MOND hat, I worry about the offset seen above for clusters. If I put on a dark matter hat, I worry about the same kind of offset for every system that is not a rich cluster of galaxies. Most critics of MOND seem unconcerned about this problem for dark matter, so how much should a critic of dark matter worry about it in MOND?


*For the hyper-pedantic: the eccentricity of each orbit causes the exact location of each planet in the first plot to oscillate up and down along the dotted line. The extent of this oscillation is smaller than the size of each symbol with the exception of Mercury, which has a relatively high eccentricity (but nowhere near enough to reach Venus).

+There are a few exceptions, of course – there are always exceptions in astronomy. The issue is whether these are physically meaningful, or the result of systematic uncertainties or non-equilibrium processes. The claimed discrepancies range from dubious to unconvincing to obviously wrong.

$I’ve heard some people criticize MOND because the centroid of the lensing signal does not peak around the gas in the Bullet cluster. This assumes that the gas represents the majority of the baryons. We know the is not the case, and that there is some missing mass in clusters. Whatever it is, it is clearly more centrally concentrated than the gas, so we don’t expect the lensing signal to peak where the gas is. All the Bullet cluster teaches us is that whatever this stuff is, it is collisionless. So this particular complaint is a logical fallacy of the a red herring and/or straw man variety born of not understanding MOND well enough to criticize it accurately. Why bother to do that when you come to the problem already sure that MOND is wrong? I understand this line of thought extraordinarily well, because that’s the attitude I started with, and I’ve seen it repeated by many colleagues. The difference is that I bothered to educate myself.

A personal note – I will be on vacation next week, so won’t be quick to respond to comments.

Clusters of galaxies ruin everything

Clusters of galaxies ruin everything

A common refrain I hear is that MOND works well in galaxies, but not in clusters of galaxies. The oft-unspoken but absolutely intended implication is that we can therefore dismiss MOND and never speak of it again. That’s silly.

Even if MOND is wrong, that it works as well as it does is surely telling us something. I would like to know why that is. Perhaps it has something to do with the nature of dark matter, but we need to engage with it to make sense of it. We will never make progress if we ignore it.

Like the seventeenth century cleric Paul Gerhardt, I’m a stickler for intellectual honesty:

“When a man lies, he murders some part of the world.”

Paul Gerhardt

I would extend this to ignoring facts. One should not only be truthful, but also as complete as possible. It does not suffice to be truthful about things that support a particular position while eliding unpleasant or unpopular facts* that point in another direction. By ignoring the successes of MOND, we murder a part of the world.

Clusters of galaxies are problematic in different ways for different paradigms. Here I’ll recap three ways in which they point in different directions.

1. Cluster baryon fractions

An unpleasant fact for MOND is that it does not suffice to explain the mass discrepancy in clusters of galaxies. When we apply Milgrom’s formula to galaxies, it explains the discrepancy that is conventionally attributed to dark matter. When we apply MOND clusters, it comes up short. This has been known for a long time; here is a figure from the review Sanders & McGaugh (2002):

Figure 10 from Sanders & McGaugh (2002): (Left) the Newtonian dynamical mass of clusters of galaxies within an observed cutoff radius (rout) vs. the total observable mass in 93 X-ray-emitting clusters of galaxies (White et al. 1997). The solid line corresponds to Mdyn = Mobs (no discrepancy). (Right) the MOND dynamical mass within rout vs. the total observable mass for the same X-ray-emitting clusters. From Sanders (1999).

The Newtonian dynamical mass exceeds what is seen in baryons (left). There is a missing mass problem in clusters. The inference is that the difference is made up by dark matter – presumably the same non-baryonic cold dark matter that we need in cosmology.

When we apply MOND, the data do not fall on the line of equality as they should (right panel). There is still excess mass. MOND suffers a missing baryon problem in clusters.

The common line of reasoning is that MOND still needs dark matter in clusters, so why consider it further? The whole point of MOND is to do away with the need of dark matter, so it is terrible if we need both! Why not just have dark matter?

This attitude was reinforced by the discovery of the Bullet Cluster. You can “see” the dark matter.

An artistic rendition of data for the Bullet Cluster. Pink represents hot X-ray emitting gas, blue the mass concentration inferred through gravitational lensing, and the optical image shows many galaxies. There are two clumps of galaxies that collided and passed through one another, getting ahead of the gas which shocked on impact and lags behind as a result. The gas of the smaller “bullet” subcluster shows a distinctive shock wave.

Of course, we can’t really see the dark matter. What we see is that the mass required by gravitational lensing observations exceeds what we see in normal matter: this is the same discrepancy that Zwicky first noticed in the 1930s. The important thing about the Bullet Cluster is that the mass is associated with the location of the galaxies, not with the gas.

The baryons that we know about in clusters are mostly in the gas, which outweighs the stars by roughly an order of magnitude. So we might expect, in a modified gravity theory like MOND, that the lensing signal would peak up on the gas, not the stars. That would be true, if the gas we see were indeed the majority of the baryons. We already knew from the first plot above that this is not the case.

I use the term missing baryons above intentionally. If one already believes in dark matter, then it is perfectly reasonable to infer that the unseen mass in clusters is the non-baryonic cold dark matter. But there is nothing about the data for clusters that requires this. There is also no reason to expect every baryon to be detected. So the unseen mass in clusters could just be ordinary matter that does not happen to be in a form we can readily detect.

I do not like the missing baryon hypothesis for clusters in MOND. I struggle to imagine how we could hide the required amount of baryonic mass, which is comparable to or exceeds the gas mass. But we know from the first figure that such a component is indicated. Indeed, the Bullet Cluster falls at the top end of the plots above, being one of the most massive objects known. From that perspective, it is perfectly ordinary: it shows the same discrepancy every other cluster shows. So the discovery of the Bullet was neither here nor there to me; it was just another example of the same problem. Indeed, it would have been weird if it hadn’t shown the same discrepancy that every other cluster showed. That it does so in a nifty visual is, well, nifty, but so what? I’m more concerned that the entire population of clusters shows a discrepancy than that this one nifty case does so.

The one new thing that the Bullet Cluster did teach us is that whatever the missing mass is, it is collisionless. The gas shocked when it collided, and lags behind the galaxies. Whatever the unseen mass is, is passed through unscathed, just like the galaxies. Anything with mass separated by lots of space will do that: stars, galaxies, cold dark matter particles, hard-to-see baryonic objects like brown dwarfs or black holes, or even massive [potentially sterile] neutrinos. All of those are logical possibilities, though none of them make a heck of a lot of sense.

As much as I dislike the possibility of unseen baryons, it is important to keep the history of the subject in mind. When Zwicky discovered the need for dark matter in clusters, the discrepancy was huge: a factor of a thousand. Some of that was due to having the distance scale wrong, but most of it was due to seeing only stars. It wasn’t until 40 some years later that we started to recognize that there was intracluster gas, and that it outweighed the stars. So for a long time, the mass ratio of dark to luminous mass was around 70:1 (using a modern distance scale), and we didn’t worry much about the absurd size of this number; mostly we just cited it as evidence that there had to be something massive and non-baryonic out there.

Really there were two missing mass problems in clusters: a baryonic missing mass problem, and a dynamical missing mass problem. Most of the baryons turned out to be in the form of intracluster gas, not stars. So the 70:1 ratio changed to 7:1. That’s a big change! It brings the ratio down from a silly number to something that is temptingly close to the universal baryon fraction of cosmology. Consequently, it becomes reasonable to believe that clusters are fair samples of the universe. All the baryons have been detected, and the remaining discrepancy is entirely due to non-baryonic cold dark matter.

That’s a relatively recent realization. For decades, we didn’t recognize that most of the normal matter in clusters was in an as-yet unseen form. There had been two distinct missing mass problems. Could it happen again? Have we really detected all the baryons, or are there still more lurking there to be discovered? I think it unlikely, but fifty years ago I would also have thought it unlikely that there would have been more mass in intracluster gas than in stars in galaxies. I was ten years old then, but it is clear from the literature that no one else was seriously worried about this at the time. Heck, when I first read Milgrom’s original paper on clusters, I thought he was engaging in wishful thinking to invoke the X-ray gas as possibly containing a lot of the mass. Turns out he was right; it just isn’t quite enough.

All that said, I nevertheless think the residual missing baryon problem MOND suffers in clusters is a serious one. I do not see a reasonable solution. Unfortunately, as I’ve discussed before, LCDM suffers an analogous missing baryon problem in galaxies, so pick your poison.

It is reasonable to imagine in LCDM that some of the missing baryons on galaxy scales are present in the form of warm/hot circum-galactic gas. We’ve been looking for that for a while, and have had some success – at least for bright galaxies where the discrepancy is modest. But the problem gets progressively worse for lower mass galaxies, so it is a bold presumption that the check-sum will work out. There is no indication (beyond faith) that it will, and the fact that it gets progressively worse for lower masses is a direct consequence of the data for galaxies looking like MOND rather than LCDM.

Consequently, both paradigms suffer a residual missing baryon problem. One is seen as fatal while the other is barely seen.

2. Cluster collision speeds

A novel thing the Bullet Cluster provides is a way to estimate the speed at which its subclusters collided. You can see the shock front in the X-ray gas in the picture above. The morphology of this feature is sensitive to the speed and other details of the collision. In order to reproduce it, the two subclusters had to collide head-on, in the plane of the sky (practically all the motion is transverse), and fast. I mean, really fast: nominally 4700 km/s. That is more than the virial speed of either cluster, and more than you would expect from dropping one object onto the other. How likely is this to happen?

There is now an enormous literature on this subject, which I won’t attempt to review. It was recognized early on that the high apparent collision speed was unlikely in LCDM. The chances of observing the bullet cluster even once in an LCDM universe range from merely unlikely (~10%) to completely absurd (< 3 x 10-9). Answers this varied follow from what aspects of both observation and theory are considered, and the annoying fact that the distribution of collision speed probabilities plummets like a stone so that slightly different estimates of the “true” collision speed make a big difference to the inferred probability. What the “true” gravitationally induced collision speed is is somewhat uncertain because the hydrodynamics of the gas plays a role in shaping the shock morphology. There is a long debate about this which bores me; it boils down to it being easy to explain a few hundred extra km/s but hard to get up to the extra 1000 km/s that is needed.

At its simplest, we can imagine the two subclusters forming in the early universe, initially expanding apart along with the Hubble flow like everything else. At some point, their mutual attraction overcomes the expansion, and the two start to fall together. How fast can they get going in the time allotted?

The Bullet Cluster is one of the most massive systems in the universe, so there is lots of dark mass to accelerate the subclusters towards each other. The object is less massive in MOND, even spotting it some unseen baryons, but the long-range force is stronger. Which effect wins?

Gary Angus wrote a code to address this simple question both conventionally and in MOND. Turns out, the longer range force wins this race. MOND is good at making things go fast. While the collision speed of the Bullet Cluster is problematic for LCDM, it is rather natural in MOND. Here is a comparison:

A reasonable answer falls out of MOND with no fuss and no muss. There is room for some hydrodynamical+ high jinx, but it isn’t needed, and the amount that is reasonable makes an already reasonable result more reasonable, boosting the collision speed from the edge of the observed band to pretty much smack in the middle. This is the sort of thing that keeps me puzzled: much as I’d like to go with the flow and just accept that it has to be dark matter that’s correct, it seems like every time there is a big surprise in LCDM, MOND just does it. Why? This must be telling us something.

3. Cluster formation times

Structure is predicted to form earlier in MOND than in LCDM. This is true for both galaxies and clusters of galaxies. In his thesis, Jay Franck found lots of candidate clusters at redshifts higher than expected. Even groups of clusters:

Figure 7 from Franck & McGaugh (2016). A group of four protocluster candidates at z = 3.5 that are proximate in space. The left panel is the sky association of the candidates, while the right panel shows their galaxy distribution along the LOS. The ellipses/boxes show the search volume boundaries (Rsearch = 20 cMpc, Δz ± 20 cMpc). Three of these (CCPC-z34-005, CCPC-z34-006, CCPC-z35-003) exist in a chain along the LOS stretching ≤120 cMpc. This may become a supercluster-sized structure at z = 0.

The cluster candidates at high redshift that Jay found are more common in the real universe than seen with mock observations made using the same techniques within the Millennium simulation. Their velocity dispersions are also larger than comparable simulated objects. This implies that the amount of mass that has assembled is larger than expected at that time in LCDM, or that speeds are boosted by something like MOND, or nothing has settled into anything like equilibrium yet. The last option seems most likely to me, but that doesn’t reconcile matters with LCDM, as we don’t see the same effect in the simulation.

MOND also predicts the early emergence of the cosmic web, which would explain the early appearance of very extended structures like the “big ring.” While some of these very large scale structures are probably not real, there seem to be a lot of such things being noted for all of them to be an illusion. The knee-jerk denials of all such structures reminds me of the shock cosmologists expressed at seeing quasars at redshifts as high as 4 (even 4.9! how can it be so?) or clusters are redshift 2, or the original CfA stickman, which surprised the bejeepers out of everybody in 1987. So many times I’ve been told that a thing can’t be true because it violates theoretician’s preconceptions, only for them to prove to be true, ultimately to be something the theorists expected all along.

Well, which is it?

So, as the title says, clusters ruin everything. The residual missing baryon problem that MOND suffers in clusters is both pernicious and persistent. It isn’t the outright falsification that many people presume it to be, but is sure don’t sit right. On the other hand, both the collision speeds of clusters (there are more examples now than just the Bullet Cluster) and the early appearance of clusters at high redshift is considerably more natural in MOND than In LCDM. So the data for clusters cuts both ways. Taking the most obvious interpretation of the Bullet Cluster data, this one object falsifies both LCDM and MOND.

As always, the conclusion one draws depends on how one weighs the different lines of evidence. This is always an invitation to the bane of cognitive dissonance, accepting that which supports our pre-existing world view and rejecting the validity of evidence that calls it into question. That’s why we have the scientific method. It was application of the scientific method that caused me to change my mind: maybe I was wrong to be so sure of the existence of cold dark matter? Maybe I’m wrong now to take MOND seriously? That’s why I’ve set criteria by which I would change my mind. What are yours?


*In the discussion associated with a debate held at KITP in 2018, one particle physicist said “We should just stop talking about rotation curves.” Straight-up said it out loud! No notes, no irony, no recognition that the dark matter paradigm faces problems beyond rotation curves.

+There are now multiple examples of colliding cluster systems known. They’re a mess (Abell 520 is also called “the train wreck cluster“), so I won’t attempt to describe them all. In Angus & McGaugh (2008) we did note that MOND predicted that high collision speeds would be more frequent than in LCDM, and I have seen nothing to make me doubt that. Indeed, Xavier Hernandez pointed out to me that supersonic shocks like that of the Bullet Cluster are often observed, but basically never occur in cosmological simulations.

Discussion of Dark Matter and Modified Gravity

To start the new year, I provide a link to a discussion I had with Simon White on Phil Halper’s YouTube channel:

In this post I’ll say little that we don’t talk about, but will add some background and mildly amusing anecdotes. I’ll also try addressing the one point of factual disagreement. For the most part, Simon & I entirely agree about the relevant facts; what we’re discussing is the interpretation of those facts. It was a perfectly civil conversation, and I hope it can provide an example for how it is possible to have a positive discussion about a controversial topic+ without personal animus.

First, I’ll comment on the title, in particular the “vs.” This is not really Simon vs. me. This is a discussion between two scientists who are trying to understand how the universe works (no small ask!). We’ve been asked to advocate for different viewpoints, so one might call it “Dark Matter vs. MOND.” I expect Simon and I could swap sides and have an equally interesting discussion. One needs to be able to do that in order to not simply be a partisan hack. It’s not like MOND is my theory – I falsified my own hypothesis long ago, and got dragged reluctantly into this business for honestly reporting that Milgrom got right what I got wrong.

For those who don’t know, Simon White is one of the preeminent scholars working on cosmological computer simulations, having done important work on galaxy formation and structure formation, the baryon fraction in clusters, and the structure of dark matter halos (Simon is the W in NFW halos). He was a Reader at the Institute of Astronomy at the University of Cambridge where we overlapped (it was my first postdoc) before he moved on to become the director of the Max Planck Institute for Astrophysics where he was mentor to many people now working in the field.

That’s a very short summary of a long and distinguished career; Simon has done lots of other things. I highlight these works because they came up at some point in our discussion. Davis, Efstathiou, Frenk, & White are the “gang of four” that was mentioned; around Cambridge I also occasionally heard them referred to as the Cold Dark Mafia. The baryon fraction of clusters was one of the key observations that led from SCDM to LCDM.

The subject of galaxy formation runs throughout our discussion. It is always a fraught issue how things form in astronomy. It is one thing to understand how stars evolve, once made; making them in the first place is another matter. Hard as that is to do in simulations, galaxy formation involves the extra element of dark matter in an expanding universe. Understanding how galaxies come to be is essential to predicting anything about what they are now, at least in the context of LCDM*. Both Simon and I have worked on this subject our entire careers, in very much the same framework if from different perspectives – by which I mean he is a theorist who does some observational work while I’m an observer who does some theory, not LCDM vs. MOND.

When Simon moved to Max Planck, the center of galaxy formation work moved as well – it seemed like he took half of Cambridge astronomy with him. This included my then-office mate, Houjun Mo. At one point I refer to the paper Mo & I wrote on the clustering of low surface brightness galaxies and how I expected them to reside in late-forming dark matter halos**. I often cite Mo, Mao, & White as a touchstone of galaxy formation theory in LCDM; they subsequently wrote an entire textbook about it. (I was already warning them then that I didn’t think their explanations of the Tully-Fisher relation were viable, at least not when combined with the effect we have subsequently named the diversity of rotation curve shapes.)

When I first began to worry that we were barking up the wrong tree with dark matter, I asked myself what could falsify it. It was hard to come up with good answers, and I worried it wasn’t falsifiable. So I started asking other people what would falsify cold dark matter. Most did not answer. They often had a shocked look like they’d never thought about it, and would rather not***. It’s a bind: no one wants it to be false, but most everyone accepts that for it to qualify as physical science it should be falsifiable. So it was a question that always provoked a record-scratch moment in which most scientists simply freeze up.

Simon was one of the first to give a straight answer to this question without hesitation, circa 1999. At that point it was clear that dark matter halos formed central density cusps in simulations; so those “cusps had to exist” in the centers of galaxies. At that point, we believed that to mean all galaxies. The question was complicated by the large dynamical contribution of stars in high surface brightness galaxies, but low surface brightness galaxies were dark matter dominated down to small radii. So we thought these were the ideal place to test the cusp hypothesis.

We no longer believe that. After many attempts at evasion, cold dark matter failed this test; feedback was invoked, and the goalposts started to move. There is now a consensus among simulators that feedback in intermediate mass galaxies can alter the inner mass distribution of dark matter halos. Exactly how this happens depends on who you ask, but it is at least possible to explain the absence of the predicted cusps. This goes in the right direction to explain some data, but by itself does not suffice to address the thornier question of why the distribution of baryons is predictive of the kinematics even when the mass is dominated by dark matter. This is why the discussion focused on the lowest mass galaxies where there hasn’t been enough star formation to drive the feedback necessary to alter cusps. Some of these galaxies can be described as having cusps, but probably not all. Thinking only in those terms elides the fact that MOND has a better record of predictive success. I want to know why this happens; it must surely be telling us something important about how the universe works.

The one point of factual disagreement we encountered had to do with the mass profile of galaxies at large radii as traced by gravitational lensing. It is always necessary to agree on the facts before debating their interpretation, so we didn’t press this far. Afterwards, Simon sent a citation to what he was talking about: this paper by Wang et al. (2016). In particular, look at their Fig. 4:

Fig. 4 of Wang et al. (2016). The excess surface density inferred from gravitational lensing for galaxies in different mass bins (data points) compared to mock observations of the same quantity made from within a simulation (lines). Looks like excellent agreement.

This plot quantifies the mass distribution around isolated galaxies to very large scales. There is good agreement between the lensing observations and the mock observations made within a simulation. Indeed, one can see an initial downward bend corresponding to the outer part of an NFW halo (the “one-halo term”), then an inflection to different behavior due to the presence of surrounding dark matter halos (the “two-halo term”). This is what Simon was talking about when he said gravitational lensing was in good agreement with LCDM.

I was thinking of a different, closely related result. I had in mind the work of Brouwer et al. (2021), which I discussed previously. Very recently, Dr. Tobias Mistele has made a revised analysis of these data. That’s worthy its own post, so I’ll leave out the details, which can be found in this preprint. The bottom line is in Fig. 2, which shows the radial acceleration relation derived from gravitational lensing around isolated galaxies:

The radial acceleration relation from weak gravitational lensing (colored points) extending existing kinematic data (grey points) to lower acceleration corresponding to very large radii (~ 1 Mpc). The dashed line is the prediction of MOND. Looks like excellent agreement.

This plot quantifies the radial acceleration due to the gravitational potential of isolated galaxies to very low accelerations. There is good agreement between the lensing observations and the extrapolation of the radial acceleration relation predicted by MOND. There are no features until extremely low acceleration where there may be a hint of the external field effect. This is what I was talking about when I said gravitational lensing was in good agreement with MOND, and that the data indicated a single halo with an r-2 density profile that extends far out where we ought to see the r-3 behavior of NFW.

The two plots above use the same method applied to the same kind of data. They should be consistent, yet they seem to tell a different story. This is the point of factual disagreement Simon and I had, so we let it be. No point in arguing about the interpretation when you can’t agree on the facts.

I do not know why these results differ, and I’m not going to attempt to solve it here. I suspect it has something to do with sample selection. Both studies rely on isolated galaxies, but how do we define that? How well do we achieve the goal of identifying isolated galaxies? No galaxy is an island; at some level, there is always a neighbor. But is it massive enough to perturb the lensing signal, or can we successfully define samples of galaxies that are effectively isolated, so that we’re only looking at the gravitational potential of that galaxy and not that of it plus some neighbors? Looks like there is some work left to do to sort this out.

Stepping back from that, we agreed on pretty much everything else. MOND as a fundamental theory remains incomplete. LCDM requires us to believe that 95% of the mass-energy content of the universe is something unknown and perhaps unknowable. Dark matter has become familiar as a term but remains a mystery so long as it goes undetected in the laboratory. Perhaps it exists and cannot be detected – this is a logical possibility – but that would be the least satisfactory result possible: we might as well resume counting angels on the head of a pin.

The community has been working on these issues for a long time. I have been working on this for a long time. It is a big problem. There is lots left to do.


+I get a lot of kill the messenger from people who are not capable of discussing controversial topics without personal animus. A lotinevitably from people who know assume they know more about the subject than I do but actually know much less. It is really amazing how many scientists equate me as a person with MOND as a theory without bothering to do any fact-checking. This is logical fallacy 101.

*The predictions of MOND are insensitive to the details of galaxy formation. Though of course an interesting question, we don’t need that in order to make predictions. All we need is the mass distribution that the kinematics respond to – we don’t need to know how it got that way. This is like the solar system, where it suffices to know Newton’s laws to compute orbits; we don’t need to know how the sun and planets formed. In contrast, one needs to know how a galaxy was assembled in LCDM to have any hope of predicting what its distribution of dark matter is and then using that to predict kinematics.

**The ideas Mo & I discussed thirty years ago have reappeared in the literature under the designation “assembly bias.”

***It was often accompanied by “why would you even ask that?” followed by a pained, constipated expression when they realized that every physical theory has to answer that question.

Holiday Concordance

Holiday Concordance

Screw the Earth and its smoking habit. The end of 2023 approaches, so let’s talk about the whole universe, which is its own special kind of mess.

As I’ve related before, our current cosmology, LCDM, was established over the course of the 1990s through a steady drip, drip, drip of results in observational cosmology – what Peebles calls the classic cosmological tests. There were many contributory results; I’m not going to attempt to go through them all. Important among them were the age problem, the realization that the mass density was lower than expected, and that there was more structure on large scales+ than predicted. These established LCDM in the mid-1990s as the “concordance model” – the most probable flavor of FLRW universe. Here is the key figure from Ostriker & Steinhardt depicting the then-allowed region of the density parameter and Hubble constant:

The addition of the cosmological constant to the standard model – replacing SCDM with LCDM – was a brain-wrenching ordeal. Lambda had long been anathema, and there was a region in which an open universe was possible, even reasonable (stripes over shade in the figure above). Moreover, this strange new LCDM made the seemingly inconceivable prediction that not only was the universe expanding [itself the older mind-bender brought to us by Hubble (and Slipher and Lemaître)], the expansion rate should be accelerating. This sounded like crazy talk at the time, so it was greeted with great rejoicing when corroborated by observations of Type Ia supernovae.

A further prediction that could distinguish LCDM from then-viable open models was the geometry of the universe. Open models have a negative curvaturek < 0, in which initially parallel light beams diverge) while the geometry in LCDM should be uniquely flat (Ωk = 0, in which initially parallel light beams remain parallel forever). Uniqueness is important, as it makes for a strong prediction, such as the location of the first peak of the acoustic power spectrum of the cosmic microwave background. In LCDM, this location was predicted to be ℓ ≈ 200 with little flexibility. For viable open models, it was more like ℓ ≈ 800 with a great deal of flexibility. The interpretation of the supernova data relied heavily on the assumption of a flat geometry, so I recall breathing a sigh of relief* when ℓ ≈ 200 was clearly observed.

Where are we now? I decided to reconstruct the Ostriker & Steinhardt plot with modern data. Here it is, with the axes swapped for reasons unrelated to this post. Deal with it.

The concordance region (white space) in the mass density-expansion rate space where the allowed regions (colored bands) of many constraints intersect. Illustrated constraints include a direct measurement of the Hubble constant, the age of the universe, the cluster baryon fraction, and large scale structure. Also shown are the best-fit values from CMB fits labeled by their date of publication (WMAP in orange; Planck in yellow). These follow the green line of constant ΩmH03; combinations of parameters along the line are tolerable but regions away from it are strongly excluded.

There is lots to be said here. First, note the scale. As the accuracy of data have improved, it has become possible to zoom in. My version of the figure is a wee postage stamp on that of Ostriker & Steinhardt. Nevertheless, the concordance region is in pretty much the same spot. Not exactly, of course; the biggest thing that has changed is that the age constraint is now completely incompatible with an open universe, so I haven’t bothered depicting it. Indeed, for the illustrated Hubble constant, the Hubble time (the age of a completely empty, “coasting” universe) is 13.4 Gyr. This is consistent with the illustrated age (13.80 ± 0.75 Gyr) only for Ωm ≈ 0, which is far off the left edge of the plot.

Second, the CMB best-fit values follow a line of constant ΩmH03. This is a deep trench in χ2 space. The region outside this trench is strongly excluded – it’s kinda the grand canyon of cosmology. Even a little off, and you’re standing on the rim looking a long way down, knowing that a much better fit is only a short step away. Once you’re in the valley of χ2, one must hunt along its bottom to find the true minimum. In the mid-`00s, a decade after Ostriker & Steinhardt, the best fit fell smack in the middle of the concordance region defined by completely independent data. It was this additional concordance that impressed me most, more than the detailed CMB fits themselves. This convinced the vast majority of scientists practicing in the field that it had to be LCDM and could only be LCDM and nothing but LCDM.

Since that time, the best-fit CMB value has wandered down the trench, away from the concordance region. These are the results that changed, not everything else. This temporal variation suggests a systematic in the interpretation of the CMB data rather than in the local distance scale.

I recall being at a conference (the Bright & Dark Universe in Naples in 2017) when the latest Planck results were announced. There was a palpable sense in the audience of having been whacked by a blunt object, like walking into a closed door you thought was open. We’d been doing precision cosmology for a long time and had settled on an answer informed by lots of independent lines of evidence, but they were telling us the One True answer was off over there. Not crazy far, but not consistent with the concordance we had come to expect. Worse, they had these crazy tiny error bars – not only were they getting an answer outside the concordance region, it was in tension with pretty much everything else. Not strong tension, but enough to make us all uncomfortable if not outright object. Indeed, there was a definite vibe that people were afraid to object. Not terrified, but nervous. Worried about being on the wrong side of the community. I get it. I know a lot about that.

People are remarkably talented at refashioning the past. Over the past five years, the Planck best-fit parameters have come to be synonymous with LCDM: all else is moot. Young scientists can be forgiven for not realizing it was ever otherwise, just as they might have been taught that cosmic acceleration was discovered by the supernova experiments totally out of the blue. These are convenient oversimplifications that elide so many pertinent events as to be tantamount to gaslighting. We refashion the past until there was never a serious controversy, then it seems strange that some of us think there still is. Sorry, not so fast, there definitely is: if you use the Planck value of the Hubble constant to estimate distances to local galaxies, you will get it wrong%, along with all distance-dependent quantities.

I’m old enough to remember a time when there was a factor of two uncertainty in the Hubble constant (50 vs. 1000) and the age constraint was the most accurate one in this plot. Thanks to genuine progress, the Hubble constant is now the more precise. Consequently, of all the data one could plot above, this is the choice that matters most to where the concordance region falls. If I adopt our own estimate (H0 = 75.1 ± 2.3 km/s/Mpc), then the concordance band gets wider and slides up a little but is basically the same as above. If instead I adopt the lowest highly accurate value, H0 = 69.8 ± 0.8 km/s/Mpc, the window slides down, but not enough to be consistent with the Planck results. Indeed, it stays to the left of the CMB constraint, becoming inconsistent with the mass density as well as the expansion rate.

Dang it, now I want to make that plot. Processing… OK, here it is:

As above, but with a lower measurement of H0. Only the range of statistical uncertainty is illustrated as a systematic uncertainty corresponds to a calibration error that slides H0 up and down – i.e., the exact situation being illustrated relative to the figure above. These two plots illustrate the range of outcomes that are possible from slightly discordant direct modern measurements of the Hubble constant; it is hard to go lower. Doing so doesn’t really help as it would just shift the tension from H0 to Ωm.

Yes, as I expected: the allowed range slides down but remains to the left of the green line. It is less inconsistent with the Planck H0, but that isn’t the only thing that matters. It is also inconsistent with the matter density. Indeed, it misses the CMB-allowed trench entirely. There is no allowed FLRW universe here.

These are only two parameters. Though arguably the most important, there are others, all of which matter to CMB fits. These are difficult to visualize simultaneously. We could, for starters, plot the baryon density as a third axis. If we did so, the concordance region would become a 3D object. It would also get squeezed, depending on what we think the baryon density actually is. Even restricting ourselves to the above-plotted constraints, there is some tension between the cluster baryon fraction and large scale structure constraint along the new third axis. I’m sure I could find in the literature more or less consistent values; this way the madness of cherry-picking lies.

There are many other constraints that could be added here. I’ve tried to stay consistent with the spirit of the original plot without making it illegible by overburdening it with lots and lots of data that all say pretty much the same thing. Nor do I wish to engage in cherry-picking. There are so many results out there that I’m sure one could find some combination that slides the allowed box this way or that – but only a little.

Whenever I’ve taught cosmology, I’ve made it a class exercise$ to investigate diagrams like this, with each student choosing an observational constraint to explore and champion. as a result, I’ve seen many variations on the above plots over the years, but since I first taught it in 1999 they’ve always been consistent with pretty much the same concordance region. It often happens that there is no concordance region; there are so many constraints that when you put them all together, nothing is left. We then debate which results to believe, or not, a process that has always been a part of the practice of cosmology.

We have painted ourselves into a corner. The usual interpretation is that we have painted ourselves into the correct corner: we live in this strange LCDM universe. It is also possible that there really is nothing left, the concordance window is closed, and we’ve falsified FLRW cosmology. That is a fate most fear to contemplate, and it seems less likely than mistakes in some discordant results, so we inevitably go down the path of cognitive dissonance, giving more credence to results that are consistent with our favorite set of LCDM parameters and less to those that do not. This is widely done without contemplating the possibility that the weird FLRW parameters we’ve ended up with are weird because they are just an approximation to some deeper theory.

So, as 2023 winds to an end, we [still] know pretty well what the parameters of cosmology are. While the tension between H0 = 67 and 73 km/s/Mpc is real, it seems like small beans compared to the successful isolation of a narrow concordance window. Sure beats arguing between 50 and 100! Even deciding which concordance window is right seems like a small matter compared to the deeper issues raised by LCDM: what is the cold dark matter? Does it really exist, or is it just a mythical entity we’ve invented for the convenient calculation of cosmic quantities? What the heck do we even mean by Lambda? Does the whole picture hang together so well that it must be correct? Or can it be falsified? Has it already been? How do we decide?

I’m sure we’ll be arguing over these questions for a long time to come.


+Structure formation is often depicted as a great success of cosmology, but it was the failure of the previous standard model, SCDM, to predict enough structure on large scales that led to its demise and its replacement by LCDM, which now faces a similar problem. The observer’s experience has consistently been that there is more structure in place earlier and on larger scales than had been anticipated before its observation.

*I believe in giving theories credit where credit is due. Putting on a cosmologist’s hat, the location of the first peak was a great success of LCDM. It was the amplitude of the second peak that came as a great surprise – unless you can take off the cosmology hat and don a MOND hat – then it was predicted. What is surprising from that perspective is the amplitude of the third peak, which makes more sense in LCDM. It seems impossible to some people that I can wear both hats without my head exploding, so they seem to simply assume I don’t think about it from their perspective when in reality it is the other way around.

%As adjudicated by galaxies with distances known from direct measurements provided by Cepheids or the tip of the red giant branch or surface brightness fluctuations or geometric methods, etc., etc., etc.

$This is a great exercise, but only works if CMB results are excluded. There has to be some narrative suspense: will the various disparate lines of evidence indeed line up? Since CMB fits constrain all parameters simultaneously, and brook no dissent, they suck the joy away from everything else in the sky and drain all interest in the debate.

Full speed in reverse!

Full speed in reverse!

People have been asking me about comments in a recent video by Sabine Hossenfelder. I have not watched it, but the quote I’m asked about is “the higher the uncertainty of the data, the better MOND seems to work” with the implication that this might mean that MOND is a systematic artifact of data interpretation. I believe, because they consulted me about it, that the origin of this claim emerged from recent work by Sabine’s student Maria Khelashvili on fitting the SPARC data.

Let me address the point about data interpretation first. Fitting the SPARC data had exactly nothing to do with attracting my attention to MOND. Detailed MOND fits to these data are not particularly important in the overall scheme of these things as I’ll discuss in excruciating detail below. Indeed, these data didn’t even exist until relatively recently.

It may, at this juncture in time, surprise some readers to learn that I was once a strong advocate for cold dark matter. I was, like many of its current advocates, rather derisive of alternatives, the most prominent at the time being baryonic dark matter. What attracted my attention to MOND was that it made a priori predictions that were corroborated, quite unexpectedly, in my data for low surface brightness galaxies. These results were surprising in terms of dark matter then and to this day remain difficult to understand. After a lot of struggle to save dark matter, I realized that the best we could hope to do with dark matter was to contrive a model that reproduced after the fact what MOND had predicted a priori. That can never be satisfactory.

So – I changed my mind. I admitted that I had been wrong to be so completely sure that the solution to the missing mass problem had to be some new form of non-baryonic dark matter. It was not easy to accept this possibility. It required lengthy and tremendous effort to admit that Milgrom had got right something that the rest of us had got wrong. But he had – his predictions came true, so what was I supposed to say? That he was wrong?

Perhaps I am wrong to take MOND seriously? I would love to be able to honestly say it is wrong so I can stop having this argument over and over. I’ve stipulated the conditions whereby I would change my mind to again believe that dark matter is indeed the better option. These conditions have not been met. Few dark matter advocates have answered the challenge to stipulate what could change their minds.

People seem to have become obsessed with making fits to data. That’s great, but it is not fundamental. Making a priori predictions is fundamental, and has nothing to do with fitting data. By construction, the prediction comes before the data. Perhaps this is one way to distinguish between incremental and revolutionary science. Fitting data is incremental science that seeks the best version of an accepted paradigm. Successful predictions are the hallmark of revolutionary science that make one take notice and say, hey, maybe something entirely different is going on.

One of the predictions of MOND is that the RAR should exist. It was not expected in dark matter. As a quick review of the history, here is the RAR as it was known in 2004 and now (as of 2016):

The radial acceleration relation constructed from data available in 2004 and that from 2016.

The big improvement provided by SPARC was a uniform estimate of the stellar mass surface density of galaxies based on Spitzer near-infrared data. These are what are used to construct the x-axis: gbar is what Newton predicts for the observed mass distribution. SPARC was a vast improvement over the optical data we had previously, to the point that the intrinsic scatter is negligibly small: the observed scatter can be attributed to the various uncertainties and the expected scatter in stellar mass-to-light ratios. The latter never goes away, but did turn out to be at the low end of the range we expected. It could easily have looked worse, as it did in 2004, even if the underlying physical relation was perfect.

Negligibly small intrinsic scatter is the best one can hope to find. The issue now is the fit quality to individual galaxies (not just the group plot above). We already know MOND fits rotation curve data. The claim that appears in Dr. Hossenfelder’s video boils down to dark matter providing better fits. This would be important if it told us something about nature. It does not. All it teaches us about is the hazards of fitting data for which the errors are not well behaved.

While SPARC provides a robust estimate of gbar, gobs is based on a heterogeneous set of rotation curves drawn from a literature spanning decades. The error bars on these rotation curves have not been estimated in a uniform way, so we cannot blindly fit the data with our favorite software tool and expect that to teach us something about physical reality. I find myself having to say this to physicists over and over and over and over and over again: you cannot trust astronomical error bars to behave as Gaussian random variables the way one would like and expect in a controlled laboratory setting.

Astronomy is not conducted in a controlled laboratory. It is an observational science. We cannot put the entire universe in a box and control all the variables. We can hope to improve the data and approach this ideal, but right now we’re nowhere near it. These fitting analyses assume that we are.

Screw it. I really am sick of explaining this over and over, so I’m just going to cut & paste verbatim what I told Hossenfelder & Khelashvili by email when they asked. This is not the first time I’ve written an email like this, and I’m sure it won’t be the last.


Excruciating details: what I said to Hossenfelder & Khelashvili about the perils of rotation curve fitting on 22 September 2023 in response for their request for comments on the draft of the relevant paper:

First, the work of Desmond is a good place to look for an opinion independent of mine. 

Second, in my experience, the fit quality you find is what I’ve found before: DM halos with a constant density core consistently give the best fits in terms of chi^2, then MOND, then NFW. The success of cored DM halos happens because it is an extremely flexible fitting function: the core radius and core density can be traded off to fit any dog’s leg, and is highly degenerate with the stellar M*/L. NFW works less well because it has a less flexible shape. But both work because they have more parameters [than MOND].

Third, statistics will not save us here. I once hoped that the BIC would sort this out, but having gone down that road, I believe the BIC does not penalize models sufficiently for adding free parameters. You allude to this at the end of section 3.2. When you go from MOND (with fixed a0 it has only one parameter, M*/L, to fit to account for everything) to a dark matter halo (which has at a minimum 3 parameters: M*/L plus two to describe the halo) then you gain an enormous amount of freedom – the volume of possible parameter space grows enormously. But the BIC just says if you had 20 degrees of freedom before, now you have 22. That does not remotely represent the amount of flexibility that represents: some free parameters are more equal than others. MOND fits and DM halo fits are not the same beast; we can’t compare them this way any more than we can compare apples and snails. 

Worse, to do this right requires that the uncertainties be real random errors. They are not. SPARC provides homogeneous mass models based on near-IR observations of the stellar mass distribution. Those should be OK to the extent that near-IR light == stellar mass. That is a decent mapping, but not perfect. Consequently, we expect the occasional galaxy to misbehave. UGC 128 is a case where the MOND fit was great with optical data then became terrible with near-IR data. The absolute difference in the data are not great, but in terms of the formal chi^2 it is. So is that a failure of the model, or of the data to represent what we want it to represent?

This happens all the time in astronomy. Here, we want to know the circular velocity of a test particle in the gravitational potential predicted by the baryonic mass distribution. We never measure either of those quantities. What we measure is the (i) stellar light distribution and the (ii) Doppler velocities of gas. We assume we can map stellar light to stellar mass and Doppler velocity to orbital speed, but no mass model is perfect, nor is any patch of observed gas guaranteed to be on a purely circular orbit. These are known unknowns: uncertainties that we know are real but we cannot easily quantify. These assumptions that we have to make to do the analysis dominate over the random errors in many cases. We also assume that galaxies are in dynamical equilibrium, but 20% of spirals show gross side-to-side asymmetries, and at least 50% mild ones. So what is the circular motion in those cases? (F579-1 is a good example)

While SPARC is homogeneous in its photometry, it is extremely heterogeneous in its rotation curve measurements. We’re working on fixing that, but it’ll take a while. Consequently, as you note, some galaxies have little constraining power while others appear to have lots. That’s because many of the rotation curve velocity uncertainties are either grossly over or underestimated. To see this, plot the cumulative distribution of chi^2 for any of your models (or see the CDF published by Li et al 2018 for the RAR and Li et al 2020 for dark matter halos of many flavors. So many, I can’t recall how many CDF we published.) Anyway, for a good model, chi^2 is always close to one, so the CDF should go up sharply and reach one quickly – there shouldn’t be many cases with very low chi^2 or very high chi^2. Unfortunately, rotation curve data do not do this for any type of model. There are always way too many cases with chi^2 << 1 and also too many with chi^2 >> 1. One might conclude that all models are unacceptable – or that the error bars are Messed Up. I think the second option is the case. If so, then this sort of analysis will always have the power to mislead. 

I insert Fig. 1 from Li et al. (2020) so you don’t have to go look it up. The CDF of a statistically good model would rise sharply, being an almost vertical line at chi^2 = 1. No model of any flavor does that. That’s in large part because the uncertainties on some rotation curves are too large, while those on others are too small. The greater flexibility of dark matter models make them incrementally better than MOND for the cases with error bars that are too small – hence the corollary statement that “the higher the uncertainty of the data, the better MOND seems to work.” This happens because dark matter models are allowed to chase bogus outliers with tiny error bars in a way that MOND cannot. That doesn’t make dark matter better, it just makes it is easier to fool.

  A key thing to watch out for is the outsized effects of a few points with tiny error bars. Among galaxies with high chi^2, what often happens is that there is one point with a tiny error bar that does not agree with any of the rest of the data for any smoothly continuous rotation curve. Fitting programs penalize a model for missing this point by many sigma, so will do anything they can to make it better. So what happens is that if you let a0 vary with a flat prior, it will got to some very silly values in order to buy a tiny improvement in chi^2. Formally, that’s a better fit, so you say OK, a0 has to vary. But if you plot the fitted RCs with fixed and variable a0, you will be hard pressed to see the difference. Chi^2 is different, sure, but both will have chi^2 >> 1, so a lousy fit either way, and we haven’t really gained anything meaningful from allowing for the greater fitting freedom. Really it is just that one point that is Wrong even though it has a tiny error bar – which you can see relative to the other points, never mind the model. Dark matter halos have more flexibility from the beginning, so this is less obvious for them even though the same thing happens.

So that’s another big point – what is the prior for a dark matter halo? [Your] Table 1 allows V200 and C200 to be pretty much anything. So yes, you will find a fit from that range. For Burkert halos, there is no prior, since these do not emerge from any theory – they’re just a flexible French curve. For NFW halos, there is a prior from cosmology – see McGaugh et al (2007) among a zillion other possible references, including Li et al (2020). In any[L]CDM cosmology, the parameters V200 and C200 correlate – they are not independent. So a reasonable prior would be a Gaussian in log(C200) at a given V200 as specified by some simulation (Macio et al; see Li et al 2020). Another prior is how V200 (or M200) relates to the observed baryonic mass (or stellar mass). This one is pretty dodgy. Originally, we expected a fixed ratio between baryonic and dark mass. So when I did this kind of analysis in the ’90s, I found NFW flunked hard compared to MOND. (I didn’t know about the BIC then.) Galaxy DM halos simply do not look like NFW halos that form in LCDM and host galaxies with a few percent of their mass in the luminous disk even though this was the standard model for many years (Mo, Mao, & White 1998). If we drop the assumption that luminous galaxies are always a fixed fraction of their dark matter halos, then better fits can be obtained. I suspect your uniform prior fits have halo masses all over the place; they probably don’t correlate well with the baryonic mass, nor are their C and V200 parameters likely to correlate as they are predicted to do. You could apply the expected mass-concentration and stellar mass-halo mass relations as priors, then NFW will come off worse in your analysis because you’ve restricted them to where they ought to live.

So, as you say – it all comes down to the prior.

Even applying a stellar mass-halo mass relation from abundance matching isn’t really independent information, though that’s the best you can hope to do. But I was saying 20+ years ago that fixed mass ratios wouldn’t work, but nobody then wanted to abandon that obvious assumption. Since then, they’ve been forced to do so. But there is no good physical reason for it (feedback is the deus ex machina of all problems in the field), what happened is that the data forced us to drop the obvious assumption. Data including kinematic data (McGaugh et al 2010). So adopting a modern stellar mass-halo mass relation will give you a stronger prior than a uniform prior, but that choice has already been informed by the kinematic data that you’re trying to fit. How do we properly penalize the model for cheating about its “prior” by peaking at past data?

So, as you say – it all comes down to the prior. I think it would be important here to better constrain the priors on the DM halo fits. Li et al (2020) discuss this. Even then we’re not done, because galaxy formation modifies the form of the halo function we’re fitting. They shouldn’t end up as NFW even if they start out that way – see Li et al 2022a & b. Those papers consider the inevitable effects of adiabatic compression, but not of feedback. If feedback really has the effects on DM halos that is frequently advertised, then neither NFW or Burkert are appropriate fitting functions – they’re not what LCDM+feedback predicts. Good luck extracting a legitimate prediction from simulations, though. So we’re stuck doing what you’re trying to do: adopt some functional form to represent the DM halo, and see what fits. What you’ve done here agrees with my experience: cored DM halos work best. But they don’t represent an LCDM prediction, or any other broader theory, so – so what? 

Another detail to be wary of – the radial range over which the RC data constrain the DM halo fit is often rather limited compared to the size of the halo. To complicate matters further, the inner regions are often star-dominated, so there is not much of a handle on DM from where the data are best, at least beyond many galaxies preferring not to have a cusp since the stars already get the job done at small R. So, one ends up with V_DM(R) constrained from 3% to 10% of the virial radius, or something like that. V200 and C200 are defined at the notional virial radius, so there are many combinations of these parameters that might adequately fit the observed range while being quite different elsewhere. Even worse, NFW halos are pretty self-similar – there are combinations of (C200,V200) that are highly degenerate, so you can’t really tell the difference between them even with excellent data – the confidence contours look like bananas in C200-V200 space, with low C/high V often being as good as high C/low V. Even even even worse is that the observed V_DM(R) is often approximately a straight line. Any function looks like a straight line if you stretch it out enough. Consequently, the fits to LSB galaxies often tend to absurdly low C and high V200: NFW never looks like a straight line, but it does if you blow it up enough. So one ends up inferring that the halo masses of tiny galaxies are nearly as big as those of huge galaxies, or more so! My favorite example was NGC 3109, a tiny dwarf on the edge of the Local Group. A straight NFW fit suggests that the halo of this one little galaxy weighs more than the entire Local Group, M31 + MW + everything else combined. This is the sort of absurd result that comes from fitting the NFW halo form to a limited radial range of data. 

I don’t know that this helps you much, but you see a few of the concerns. 

How things go mostly right or badly wrong

How things go mostly right or badly wrong

People often ask me of how “perfect” MOND has to be. The short answer is that it agrees with galaxy data as “perfectly” as we can perceive – i.e., the scatter in the credible data is accounted for entirely by known errors and the expected scatter in stellar mass-to-light ratios. Sometimes it nevertheless looks to go badly wrong. That’s often because we need to know both the mass distribution and the kinematics perfectly. Here I’ll use the Milky Way as an example of how easily things can look bad when they aren’t.

First, an update. I had hoped to stop talking about the Milky Way after the recent series of posts. But it is in the news, and there is always more to say. A new realization of the rotation curve from the Gaia DR3 data has appeared, so let’s look at all the DR3 data together:

Gaia DR3 realizations of the Milky Way rotation curve. The most recent version of these data from Poder et al (2023) are shown as blue squares over the range 5 < R < 13 kpc. Other Gaia DR3 realizations include Ou et al. (2023, green circles), Wang et al. (2023, magenta downward pointing triangles), and Zhou et al. (2023, purple triangles).

The new Gaia realization does not go very far out, and has larger uncertainties. That doesn’t mean it is worse; it might simply be more conservative in estimating uncertainties, and not making a claim where the data don’t substantiate it. Neither does that mean the other realizations are wrong: these differences are what happens in different analyses. Indeed, all the independent realizations of the Gaia data are pretty consistent, despite the different stellar selection criteria and analysis techniques. This is especially true for R < 17 kpc where there are lots of stars informing the measurements. Even beyond that, I would say they are consistent at the level we’d expect for astronomy.

Zooming out to compare with other results:

The Milky Way rotation curve. The model line from McGaugh (2018) is shown with data from various sources. The abscissa switches from linear to logarithmic at 10 kpc to wedge it all in. The location of the Large Magellanic Cloud at 50 kpc is noted. Gaia DR3 data (Poder et al., Ou et al., Wang et al., and Zhou et al.) are shown as in the plot above. The small black squares are the Gaia DR2 realization of Eilers et al. (2019) reanalyzed to include the effect of bumps and wiggles by McGaugh (2019). Non-Gaia data include blue horizontal branch stars (light blue squares) and red giants (red squares) in the stellar halo (Bird et al. 2022), globular clusters (Watkins et al. 2019, pink triangles), VVV stars (Portail et al. 2017, dark grey squares at R < 2.2 kpc), and terminal velocities (McClure-Griffiths & Dickey 2007, 2016, light grey points from 3 < R < 8 kpc). These terminal velocities are the only data that inform the model line; everything else follows.

Overall, I would say the data paint a pretty consistent picture. The biggest tension amongst the data illustrated here is between the outermost Gaia points around R = 25 kpc and the corresponding results from halo stars. One is consistent with the model line and the other is not. We shouldn’t allow the model to inform our interpretation; the important point is that the independent data disagree with each other. This happens all the time in astronomy. Sometimes it boils down to different assumptions; sometimes it is a real discrepancy. Either way, one has to learn* to cope.

The sharp-eyed will also notice an apparent tension between the DR2 data (black squares) and DR3 around 6 and 7 kpc. This is not real – it is an artifact of different treatments of the term in the Jeans equation for the logarithmic derivative of the density profile of the tracer particles. That’s a choice made in the analysis. The data are entirely consistent when treated consistently.

Putting on an empiricist’s hat, I will say that the kink in the slope of the Gaia data around R = 18 kpc looks unnatural. That doesn’t happen in other galaxies. Rather than belabor the point further, I’ll simply say that this is how things mostly go right but also a little wrong. This is as good as we can hope for in [extra]galactic astronomy.

In contrast, it is easy to go very wrong. To give an example, here is a model of the Milky Way that was built to approximately match the rotation curve of Sofue (2020).


Fig. 1 from Dai et al. (2022). Note the logarithmic abscissa. Their caption: The rotation curve of the Milky Way. The data (solid dark circles with error bars) for r < 100kpc come from [22], while for r > 100kpc from [23]. The solid, dashed and doted lines describe the contribution from the bulge, stellar disk and dark matter halo respectively, within a ΛCDM model of the galaxy. The dashed-dot line is the total contribution of all three components.The parameters of each component are taken from [24]. For comparison, the Milky way rotation curve from Gaia DR2 is shown in color. The red dots are data from [34], the blue upward-pointing triangles are from [35], while the cyan downward-pointing triangles are from [36].

This realization of the rotation curve is very different from that seen above. Note that the rotation curve (black points) is very different from that of Gaia (red points) over the same radial range. These independent data are inconsistent; at least one of them is wrong. The data extend to very large radii, encompassing not only the LMC but also Andromeda (780 kpc away). I am already concerned about the effects of the LMC at 50 kpc; Andromeda is twice the baryonic mass of the Milky Way so anything beyond 260 kpc is more Andromeda’s territory than ours – depending on which side we’re talking about. The uncertainties are so big out there they provide no constraining power anyway.

In terms of MOND-required perfection, things fall apart for the Dai model already at very small radii. Dai et al. (2022) chose to fit their bulge component to the high amplitude terminal velocities of Sofue. That’s a reasonable thing to do, if we think the terminal velocities represent circular motion. Because of the non-circular motions that sustain the Galactic bar, they almost certainly do not – that’s why I restricted use of terminal velocities to larger radii. We also know something about the light distribution:

The inner 3 kpc of the Milky Way. The circles are the terminal velocities of Sofue (2020); the squares are the equivalent circular velocity of the potential reconstructed from the kinematics of stars in the VVV survey (Portail et al. 2017). The line is the bulge-bar model of McGaugh (2008) based on the light distribution reported by Binney et al (1997).

This is essentially the same graph as I showed before, but showing only the Newtonian bulge-bar component, and on a logarithmic abscissa for comparison with the plot of Dai et al. The two bulge models are very different. That of Dai et al. is more massive and more compact, as required to match the terminal velocities. There may be galaxies out there that look like this, but the Milky Way is not one of them.

Indeed, Newton’s prediction for the rotation curve of the bulge-bar component – the line labeled bulge/bar based on what the Milky Way looks like – is in good agreement with the effective circular speed curve obtained from stellar data. It is not consistent with the terminal velocities. We could increase the amplitude of the Newtonian prediction by increasing the mass-to-light ratio of the stars (I have adopted the value I expect for stellar populations), but the shape would still be wrong. This does not come as a surprise to most Galactic astronomers, because we know there is a bar in the center of the Milky Way and we know that bars induce non-circular motions, so we do not expect the terminal velocities to be a fair tracer of the rotation curve in this region. That’s why Portail et al. had to go to great lengths in their analysis to reconstruct the equivalent circular velocity, as did I just to build the bulge-bar model.

The thing about predicting rotation curves from the observed mass, as MOND does, is that you have to get both the kinematic data and the mass distribution right. The velocity predicted at any radius depends on the mass enclosed by that radius. So if we get the bulge badly wrong, everything spirals down the drain from there.

Dai et al. (2022) compare their model to the acceleration residuals predicted by MOND for their mass model. If all is well, the data should scatter around the constant line at zero in this graph:

Fig. 4 from Dai et al. (2022). Their caption: [The radial acceleration relation] recast as a comparison between the total acceleration, a, and the MOND prediction, aM , as a function of the acceleration due to baryons aB. The solid horizontal line is a = aM. The circles and squares with error bars represent the Milky Way and M31 data, while the gray dots are from the EAGLE simulation of ΛCDM in [1]. For aB > 10−10m/s2 any difference between a and aM is unclear. However, once aB drops well below 10−11m/s2, the discrepancy emerges. The short-dashed line is the ΛCDM fitting curve of the MW. The dash-dot line is the ΛCDM fitting curve of M31. The mass range** of galaxies in EAGLE’s data is chosen to be between 5 × 1010M to 5 × 1011M. For comparison, the Milky way rotation curve from GAIA data release II is shown in color. The red dots are data from [34], the blue triangles are from [35], while the cyan down triangles are from [36]. While the EAGLE simulation does not match the data perfectly, these plots indicate that it is much easier to accommodate a systematic downward trend with the ΛCDM model than with MOND.

Things are not well.

The interpretation that is offered (right in the figure caption) is that MOND is wrong and the LCDM-based EAGLE simulation does a better if not perfect job of explaining things. We already know that’s not right. The alternate interpretation is that this is not a valid representation of the prediction of MOND, because their mass model does not follow from the observed distribution of light. They get neither the baryonic mass distribution and its predicted acceleration ab nor the total acceleration a right in the plot above.

In terms of dark matter, the model of Dai et al. may appear viable. In terms of MOND, it is way off, not just a little off. The residuals are only zero, as they should be, for a narrow range of accelerations, 2 to 3 x 10-10 m/s/s. That’s more Newton than MOND, and appears to correspond to the limited range in radii over which their model matches the rotation curve data in their Fig. 1 (roughly 4 to 6 kpc). It doesn’t really fit the data elsewhere, and the restrictions on a MOND fit are considerably more stringent than on the sort of dark matter model they construct: there’s no reason to expect their model to behave like MOND in the first place.

And, hoo boy, does it ever not behave like MOND. Look at how far those red points – the Gaia DR2 data – deviate from zero in their Fig. 4. Those are the exact same data that agree well with the model line I show above – the data that were correctly predicted in advance. This model is a reasonable representation of the radial force predicted by MOND, with the blue line in my plot being equivalent to the zero line in theirs.

This is how things can go badly wrong. To properly apply MOND, we need to measure both the kinematics and baryonic mass distribution correctly. If we screw either up, as is easy to do in astronomy, then the result will look very wrong, even if it shouldn’t. Combine this with the eagerness many people have to dismiss MOND outright, and you wind up with lots of articles claiming that MOND is wrong – even when that’s not really the story the data tell. Happens over and over again, so the field remains stagnant.


*This is a large part of the cultural difference between physics and astronomy. Physicists are spoiled by laboratory experiments done in controlled conditions in which one can measure to the sixth place of decimals. In contrast, astronomy is an observational rather than experimental science. We can’t put the universe in a box and control all the systematics – measuring most quantities to 1% is a tall order. Consequently, astronomers are used to being wrong. While I wouldn’t say that astronomers cope with it gracefully, they’re well aware that it happens, that is has happened a lot historically, and will continue to happen in the future. It is a risk we all take in trying to understand a universe so much vaster than ourselves. This makes astronomers rather more tolerant of surprising results – results where the first response is “that can’t be right!” but also informed by the experience that “we’ve been wrong before!” Physicists coming to the field generally lack this experience and take the error bars way too seriously. I notice this attitude is creeping into the younger generation of astronomers; people who’ve received their data from distant observatories and performed CPU-intensive MCMC error analyses, so want to believe them, but often lack the experience of dozens of nights spent at the observatory sweating a thousand ill-controlled but consequential details, like walking out to a beautiful sunrise decorated by wisps of cirrus clouds. When did those arrive?!?


**The data that define the radial acceleration relation come from galaxies spanning six decades in stellar mass, so this one decade range from the simulations is tiny – it is literally comparing a factor of ten to a a factor of a million. What happens outside the illustrated mass range? Are lower masses even resolved?

A Response to Recent Developments Concerning the Gravitational Potential of the Milky Way

A Response to Recent Developments Concerning the Gravitational Potential of the Milky Way

In the series of recent posts I’ve made about the Milky Way, I missed an important reply made in the comments by Francois Hammer, one of the eminent scientists doing the work. I was on to writing the next post when he wrote it, and simply didn’t see it until yesterday. Dr. Hammer has some important things to say that are both illustrative of the specific topic and also of how science should work. I wanted to highlight his concerns with their own post, so, with his permission, I cut & paste his comments below, making this, in effect, a guest post by Francois Hammer.


There are two aspects we’d like to mention, as they may help to clarify part of the debate:
1- When saying “Gaia is great, but has its limits. It is really optimized for nearby stars (within a few kpc). Outside of that, the statistics… leave something to be desired. Is it safe to push out beyond 20 kpc?”, one may wonder whether the significance of Gaia data has been really understood.
In the Eilers et al. 2019 DR2 rotation curve, you may see points with small error bar up to 21-22 kpc. Gaia DR3 provides proper motion (systematics) uncertainties that are 2 times smaller than from Gaia DR2, so it can easily goes to 25 kpc or more.
The gain in quality for parallaxes is indeed smaller (30% gain). However, our results cannot be affected by distance estimates, since the large number of stars with parallax estimates in Wang et al. (2023) is giving the same rotation curve than that from (a lower number of) RGB stars with spectrophotometric distances (Ou et al. 2023), i.e., following Eilers et al. 2019. And both show a Keplerian decline, which was already noticeable with DR2 results from Eilers et al 2019. The latter authors said in their conclusions: “We do see a mild but significant deviation from the straightly declining circular velocity curve at R≈19–21 kpc of Δv≈15 km s−1.” Our work using Gaia DR3 is nothing else than having a factor 2 better in accounting for systematics, and then being able to resolve what looks like a Keplerian decrease of the rotation curve.
We may also mention here that one of us participated to an unprecedented study of the kinematics LMC (Gaia Collaboration 2021, Luri’s paper), which is at 50 kpc. Unless one proves everything that people has done about the LMC and MW is wrong, and that the data are too uncertain to conclude anything about what happens at R=17-25 kpc, the above clarifications about Gaia accuracy are truly necessary for people reading your blog.
2- The argument that the result “violates a gazillion well-established constraints.” has to be taken with some caution, since otherwise, no one can do any progress in the field. In fact, the problem with many probes (so-called “satellites”) in the MW halo, is the fact that one cannot guarantee whether or not their orbits are at equilibrium with the MW potential. This is the reverse for the MW disk, for which stars are rotating in the disk, and, e.g., at 25 kpc, they have likely experience 7-8 orbits since the last merger (Gaia-Sausage-Enceladus), about 9 billion years go. In other words, the mass provided by a system mostly at equilibrium, likely supersedes masses provided by systems that equilibrium conditions are not secured. An interesting example of this is given by globular clusters (GCs). If taken as an ensemble of 156 GCs (from Baumgardt catalog), just by removing Pyxis and Terzan 8, the MW mass inside 50 kpc passes from 5.5 to 2.1 10^11 Msun. This is likely because these two GCs may have come quite recently, meaning that their initial kinetic energy is still contributing to their total energy. A similar mass overestimate could happen if one accounts the LMC or Leo I as MW satellites at equilibrium with the MW potential.
So we agree that near 25 kpc the disk of the MW may show signs of less-equilibrium, or sign of slightly less circular orbits due to different phenomenas discussed in the blog. However, why taking into account objects for which there is no proof they are at equilibrium as being the true measurements?
In our work, we have considerably focused in understanding and expanding the whole contribution of systematics, which may comes from Gaia data, but also from assumptions about stellar profile (i.e., deviations from exponential profiles), from the Sun distance and proper motion and so on. You may find a description in Ou et al.’s Figure 5 and Jiao et al.’s Figure 4, both showing that systematics cannot gives much more than 10% error on circular velocity estimates. This is an area where we are considered by the Local Group community as being quite conservative, and following Gaia specialists with who we have worked to deliver the EDR3 catalog of dwarf galaxy motions (Li, Hammer, Babusiaux et al 2021) up to about 150 kpc. Jiao et al. paper main contribution is the fair accounting of systematics, which analysis shows error bars that are much larger than those from other sources of errors especially in MW outskirts (see Fig. 2).

Francois Hammer, 24 September 2023

The image at top is Fig. 2 from Jiao et al. illustrating their assessment of the rotation curve and its systematic uncertainties.

Recent Developments Concerning the Gravitational Potential of the Milky Way. III. A Closer Look at the RAR Model

Recent Developments Concerning the Gravitational Potential of the Milky Way. III. A Closer Look at the RAR Model

I am primarily an extragalactic astronomer – someone who studies galaxies outside our own. Our home Galaxy is a subject in its own right. Naturally, I became curious how the Milky Way appeared in the light of the systematic behaviors we have learned from external galaxies. I first wrote a paper about it in 2008; in the process I realized that I could use the RAR to infer the distribution of stellar mass from the terminal velocities observed in interstellar gas. That’s not necessary in external galaxies, where we can measure the light distribution, but we don’t get a view of the whole Galaxy from our location within it. Still, it wasn’t my field, so it wasn’t until 2015/16 that I did the exercise in detail. Shortly after that, the folks who study the supermassive black hole at the center of the Galaxy provided a very precise constraint on the distance there. That was the one big systematic uncertainty in my own work up to that point, but I had guessed well enough, so it didn’t make a big change. Still, I updated the model to the new distance in 2018, and provided its details on my model page so anyone could use it. Then Gaia data started to pour in, which was overwhelming, but I found I really didn’t need to do any updating: the second data release indicated a declining rotation curve at exactly the rate the model predicted: -1.7 km/s/kpc. So far so good.

I call it the RAR model because it only involves the radial force. All I did was assume that the Milky Way was a typical spiral galaxy that followed the RAR, and ask what the mass distribution of the stars needed to be to match the observed terminal velocities. This is a purely empirical exercise that should work regardless of the underlying cause of the RAR, be it MOND or something else. Of course, MOND is the only theory that explicitly predicted the RAR ahead of time, but we’ve gone to great lengths to establish that the RAR is present empirically whether we know about MOND or not. If we accept that the cause of the RAR is MOND, which is the natural interpretation, then MOND over-predicts the vertical motions by a bit. That may be an important clue, either into how MOND works (it doesn’t necessarily follow the most naive assumption) or how something else might cause the observed MONDian phenomenology, or it could just be another systematic uncertainty of the sort that always plagues astronomy. Here I will focus on the RAR model, highlighting specific radial ranges where the details of the RAR model provide insight that can’t be obtained in other ways.

The RAR Milky Way model was fit to the terminal velocity data (in grey) over the radial range 3 < R < 8 kpc. Everything outside of that range is a prediction. It is not a prediction limited to that skinny blue line, as I have to extrapolate the mass distribution of the Milky Way to arbitrarily large radii. If there is a gradient in the mass-to-light ratio, or even if I guess a little wrong in the extrapolation, it’ll go off at some point. It shouldn’t be far off, as V(R) is mostly fixed by the enclosed mass. Mostly. If there is something else out there, it’ll be higher (like the cyan line including an estimate of the coronal gas in the plot that goes out to 130 kpc). If there is a bit less than the extrapolation, it’ll be lower.

The RAR model Milky Way (blue line) together with the terminal velocities to which it was fit (light grey points), VVV data in the inner 2.2 kpc (dark grey squares), and the Zhou et al. (2023) realization of the Gaia DR3 data. Also shown are the number of stars per bin from Gaia (right axis).

From 8 to 19 kpc, the Gaia data as realized by Zhao et al. fall bang on the model. They evince exactly the slowly declining rotation curve that was predicted. That’s pretty good for an extrapolation from R < 8 kpc. I’m not aware of any other model that did this well in advance of the observation. Indeed, I can’t think of a way to even make a prediction with a dark matter model. I’ve tried this – a lot – and it is as easy to come up with a model whose rotation curve is rising as one that is falling. There’s nothing in the dark matter paradigm that is predictive at this level of detail.

Beyond R > 19 kpc, the match of the model and Zhou et al. realization of the data is not perfect. It is still pretty damn good by astronomical standards, and better than the Keplerian dotted line. Cosmologists would be wetting themselves with excitement if they could come this close to predicting anything. Heck, they’re known to do that even when they’re obviously wrong*.

If the difference between the outermost data and the blue line is correct, then all it means is that we have to tweak the model to have a bit less mass than assumed in the extrapolation. I call it a tweak because it would be exactly that: a small change to an assumption I was obliged to make in order to do the calculation. I could have assumed something else, and almost did: there is discussion in the literature that the disk of the Milky Way is truncated at 20 kpc. I considered using a mass model with such a feature, but one can’t make it a sharp edge as that introduces numerical artifacts when solving the Poisson equation numerically, as this procedure depends on derivatives that blow up when they encounter sharp features. Presumably the physical truncation isn’t unphysically sharp anyway, rather being a transition to a steeper exponential decline as we sometimes see in other galaxies. However, despite indications of such an effect, there wasn’t enough data to constrain it in a way useful for my model. So rather than introduce a bunch of extra, unconstrained freedom into the model, I made a straight extrapolation from what I had all the way to infinity in the full knowledge that this had to be wrong at some level. Perhaps we’ve found that level.

That said, I’m happy with the agreement of the data with the model as is. The data become very sparse where there is even a hint of disagreement. Where there are thousands of stars per bin in the well-fit portion of the rotation curve, there are only tens per bin outside 20 kpc. When the numbers get that small, one has to start to worry that there are not enough independent samples of phase space. A sizeable fraction of those tens of stars could be part of the same stellar stream, which would bias the results to that particular unrepresentative orbit. I don’t know if that’s the case, which is the point: it is just one of the many potential systematic uncertainties that are not represented in the formal error bars. Missing those last five points by two sigma is as likely to be an indication that the error bars have been underestimated as it is to be an indication that the model is inadequate. Trying to account for this sort of thing is why the error bars of Jiao et al. are so much bigger than the formal uncertainties in the three realization papers.

That’s the outer regions. The place where the RAR model disagrees the most with the Gaia data is from 5 < R < 8 kpc, which is in the range where it was fit! So what’s going on there?

Again, the data disagree with the data. The stellar data from Gaia disagree with the terminal velocity data from interstellar gas at high significance. The RAR model was fit to the latter, so it must per force disagree with the former. It is tempting to dismiss one or the other as wrong, but do they really disagree?

Adapted from Fig. 4 of McGaugh (2019). Grey points are the first and fourth quadrant terminal velocity data to which the model (blue line) was matched. The red squares are the stellar rotation curve estimated with Gaia DR2 (DR3 is indistinguishable). The black squares are the stellar rotation curve after adjustment to be consistent with a mass profile that includes spiral arms. This adjustment for self-consistency remedies the apparent discrepancy between gas and stellar data.

In order to build the model depicted above, I chose to split the difference between the first and fourth quadrant terminal velocity data. I fit them separately in McGaugh (2016) where I made the additional point that the apparent difference between the two quadrants is what we expect from an m=2 mode – i.e., a galaxy with spiral arms. That means these velocities are not exactly circular as commonly assumed, and as I must per force assume to build the model. So I split the difference above in the full knowledge that this is not the exact circular velocity curve of the Galaxy, it’s just the best I can do at present. This is another example of the systematic uncertainties we encounter: the difference between the first and fourth quadrant is real and is telling us that the galaxy is not azimuthally symmetric – as anyone can tell by looking at any spiral galaxy, but is a detail we’d like to ignore so we can talk about disk+dark matter halo models in the convenient limit of axisymmetry.

Though not perfect – no model is – the RAR model Milky Way is a lot better than models that ignore spiral structure entirely, which is basically all of them. The standard procedure assumes an exponential disk and some form of dark matter halo. Allowance is usually made for a central bulge component, but it is relatively rare to bother to include the interstellar gas, much less consider deviations from a pure exponential disk. Having adopted the approximation of an exponential disk, one inevitably get a smooth rotation curve like the dashed line below:

Fig. 1 from McGaugh (2019). Red points are the binned fourth quadrant molecular hydrogen terminal velocities to which the model (blue line) has been fit. The dotted lines shows the corresponding Newtonian rotation curve of the baryons. The dashed line is the model of Bovy & Rix (2013) built assuming an exponential disk. The inset shows residuals of the models from the data. The exponential model does not and cannot fit these data.

The common assumption of exponential disk precludes the possibility of fitting the bumps and wiggles observed in the terminal velocities. These occur because of deviations from a pure exponential profile caused by features like spiral arms. By making this assumption, the variations in mass due to spiral arms is artificially smoothed over. They are not there by assumption, and there is no way to recover them in a dark matter fit that doesn’t know about the RAR.

Depending on what one is trying to accomplish, an exponential model may suffice. The Bovy & Rix model shown above is perfectly reasonable for what they were trying to do, which involved the vertical motions of stars, not the bumps and wiggles in the rotation curve. I would say that the result they obtain is in reasonable agreement with the rotation curve, given what they were doing and in full knowledge that we can’t expect to hit every error bar of every datum of every sort. But for the benefit of the chi-square enthusiasts who are concerned about missing a few data points at large radii, the reduced chi-squared of the Bovy & Rix model is 14.35 while that of the RAR model is 0.6. A good fit is around 1, so the RAR model is a good fit while the smooth exponential is terrible – as one can see by eye in the residual inset: the smooth exponential model gets the overall amplitude about right, but hits none of the data. That’s the starting point for every dark matter model that assumes an exponential disk; even if they do a marginally better job of fitting the alleged Keplerian downturn, they’re still a lot worse if we consider the terminal velocity data, the details of which are usually ignored.

If instead we pay attention the details of the terminal velocity data, we discover that the broad features seen there in are pretty much what we expect for the kinematic signatures of photometrically known spiral arms. That is, the mass density variations inferred by fitting the RAR correspond to spiral arms that are independently known from star counts. We’ve discussed this before.

Spiral structure in the Milky Way (left) as traced by HII regions and Giant Molecular Clouds (GMCs). These correspond to bumps in the surface density profile inferred from kinematics with the RAR (right).

If we accept that the bumps and wiggles in the terminal velocities are tracers of bumps and wiggles in the stellar mass profiles, as seen in external galaxies, then we can return to examining the apparent discrepancy between them and the stellar rotation curve from Gaia. The latter follow from an application of the Jeans equation, which helps us sort out the circular motion from the mildly eccentric orbits of many stars. It includes a term that depends on the gradient of the density profile of the stars that trace the gravitational potential. If we assume an exponential disk, then that term is easily calculated. It is slowly and smoothly varying, and has little impact on the outcome. One can explore variations of the assumed scale length of the disk, and these likewise have little impact, leading us to infer that we don’t need to worry about it. The trouble with this inference is that it is predicated on the assumption of a smooth exponential disk. We are implicitly assuming that there are no bumps and wiggles.

The bumps and wiggles are explicitly part of the RAR model. Consequently, the gradient term in the Jeans equation has a modest but important impact on the result. Applying it to the Gaia data, I get the black points:

The red squares are the Gaia DR2 data. The black squares are the same data after including in the Jeans equation the effect of variations in the tracer gradient. This term dominates the uncertainties.

The velocities of the Gaia data in the range illustrated all go up. This systematic effect reconciles the apparent discrepancy between the stellar and gas rotation curves. The red points are highly discrepant from the gray points, but the black points are not. All it took was to drop the assumption of a smooth exponential profile and calculate the density gradient numerically from the data. This difference has a more pronounced impact on rotation curve fits than any of the differences between the various realizations of the Gaia DR3 data – hence my cavalier attitude towards their error bars. Those are not the important uncertainties.

Indeed, I caution that we still don’t know what the effective circular velocity of the potential is. I’ve made my best guess by splitting the difference between the first and fourth quadrant terminal velocity data, but I’ve surely not got it perfectly right. One might view the difference between the quadrants as the level at which the perfect quantity is practically unknowable. I don’t think it is quite that bad, but I hope I have at least given the reader some flavor for some of the hidden systematic uncertainties that we struggle with in astronomy.

It gets worse! At small radii, there is good reason to be wary of the extent to which terminal velocities represent circular motion. Our Galaxy hosts a strong bar, as artistically depicted here:

Artist’s rendition of the Milky Way. Image credit: NASA/JPL-Caltech.

Bars are a rich topic in their own right. They are supported by non-circular orbits that maintain their pattern. Consequently, one does not expect gas in the region where the bar is to be on circular orbits. It is not entirely clear how long the bar in our Galaxy is, but it is at least 3 kpc – which is why I have not attempted to fit data interior to that. I do, however, have to account for the mass in that region. So I built a model based on the observed light distribution. It’s a nifty bit of math to work out the equivalent circular velocity corresponding to a triaxial bar structure, so having done it once I’ve not been keen to do it again. This fixes the shape of the rotation curve in the inner region, though the amplitude may shift up and down with the mass-to-light ratio of the stars, which dominate the gravitational potential at small radii. This deserves its own close up:

Colored points are terminal velocities from Marasco et al. (2017), from both molecular (red) and atomic (green) gas. Light gray circles are from Sofue (2020). These are plotted assuming they represent circular motions, which they do not. Dark grey squares are the equivalent circular velocity inferred from stars in the VVV survey. The black line is the Newtonian mass model for the central bar and disk, and the blue line is the corresponding RAR model as seen above.

Here is another place where the terminal velocities disagree with the stellar data. This time, it is because the terminal velocities do not trace circular motion. If we assume they do, then we get what is depicted above, and for many years, that was thought to be the Galactic rotation curve, complete with a pronounced classical bulge. Many decades later, we know the center of the Galaxy is not dominated by a bulge but rather a bar, with concominant non-circular motions – motions that have been observed in the stars and carefully used to reconstruct the equivalent circular velocity curve by Portail et al. (2017). This is exactly what we need to compare to the RAR model.

Note that 2008, when the bar model was constructed, predates 2017 (or the 2016 appearance of the preprint). While it would have been fair to tweak the model as the data improved, this did not prove necessary. The RAR model effectively predicted the inner rotation curve a priori. That’s a considerably more impressive feat than getting the outer slope right, but the model manages both sans effort.

No dark matter model can make an equivalent boast. Indeed, it is not obvious how to do this at all; usually people just make a crude assumption with some convenient approximation like the Hernquist potential and call it a day without bothering to fit the inner data. The obvious prediction for a dark matter model overshoots the inner rotation curve, as there is no room for the cusp predicted in cold dark matter halos – stars dominate the central potential. One can of course invoke feedback to fix this, but it is a post hoc kludge rather than a prediction, and one that isn’t supposed to apply in galaxies as massive as the Milky Way. Unless it needs to, of course.

So, lets’s see – the RAR model Milky Way reconciles the tension between stellar and interstellar velocity data, indicates density bumps that are in the right location to correspond to actual spiral arms, matches the effective circular velocity curve determined for stars in the Galactic bar, correctly predicted the slope of the rotation curve outside the solar circle out to at least 19 kpc, and is consistent with the bulk of the data at much larger radii. That’s a pretty successful model. Some realizations of the Gaia DR3 data are a bit lower than predicted, but others are not. Hopefully our knowledge of the outer rotation curve will continue to improve. Maybe the day will come when the data have improved to the point where the model needs to be tweaked a little bit, but it is not this day.


*To give one example, the BICEP II experiment infamously claimed in March of 2014 to have detected the Inflationary signal of primordial gravitational waves in their polarization data. They held a huge press conference to announce the result in clear anticipation of earning a Nobel prize. They did this before releasing the science paper, much less hearing back from a referee. When they did release the science paper, it was immediately obvious on inspection that they had incorrectly estimated the dust foreground. Their signal was just that – excess foreground emission. I could see that in a quick glance at the relevant figure as soon as the paper was made available. Literally – I picked it up, scanned through it, saw the relevant figure, and could immediately spot where they had gone wrong. And yet this huge group of scientists all signed their name to the submitted paper and hyped it as the cosmic “discovery of the century”. Pfft.