A post in which some value judgements are made about the situation with wide binaries

A post in which some value judgements are made about the situation with wide binaries

I have tried very hard to remain objective and even handed, but I find that I weary of the wide binary debate. I don’t know what the right answer will turn out to be. But I do have opinions.

For starters, it is a big Galaxy. There is just too much to know. When I wrote about the Milky Way earlier this year, the idea was to set up an expectation value for wide binaries in the solar neighborhood. That devolved into at least eight other posts on the Milky Way itself, because our Galaxy is too damn interesting, and has its own controversies. So it occurs to me that I never really got on with the regularly scheduled program.

In my assessment, the radial acceleration at the solar circle is 2.2 x 10-10 m/s/s, which in terms of the MOND acceleration scale is 1.8 a0. We live on the Newtonian side of the transition to the MOND regime. The ideal place to test MOND with wide binaries would be the deep MOND regime, well below a0. That is in a part of the Galaxy that is far, far away, and not currently accessible to us. What is accessible are wide binaries in the solar neighborhood (within 250 pc, about 1% of the Galaxy’s radius) as mapped by Gaia. Locally, the MOND effect is modest, but nonzero. We’re close enough to the transition for there to be a small, detectable effect.

Local binaries that are widely separated enough for their internal acceleration to drop below a0 find themselves in the regime dominated by the field of the rest of the Galaxy and subject to the so-called External Field Effect (EFE). This situation is illustrated in the lower right panel below.

Mass estimators in different regimes of acceleration. The top row illustrates pure Newtonian (left) and MOND (right) regimes. The bottom row illustrates the case of small systems embedded in larger systems. A low acceleration system embedded in a Newtonian external field is Newtonian (left) while a very low acceleration system embedded in a merely low acceleration system is quasi-Newtonian (right). Wide binaries fall in the last category.

Intriguingly, orbits in the EFE regime remain Keplerian. The rotation curves of nearby binaries, if you could map them, are not expected to be flat in MOND. They should, however, experience enhanced speeds, with a boost to the effective value of Newton’s constant: G → γG. The value of γ depends on the sum of internal and external acceleration as well as the shape of the interpolation function when near a0. That’s one reason to prefer to do this experiment in the deep MOND regime, where the shape of the interpolation function doesn’t matter. But that’s not where we live. For Galactic data and viable possibilities* for the interpolation function, a reasonable expectation value is γ = 1.4 ± 0.1. This is what the wide binary papers attempt to measure. So, what do they find?

Hernandez et al. find γ = 1.0±0.1 for 466 close binaries with 2D separations less than 0.01 pc (about 2000 AU) and γ = 1.5±0.2 for 108 wide binaries with 2D separations greater than 0.01 pc. A purely Newtonian result (γ = 1) is recovered in the high acceleration regime of relatively close binaries where this is expected to be the case. For wider binaries, one finds a boost value consistent with the prediction of MOND and differing from Newton with modest significance (2.6σ).

Chae reports+ γ = 1.49(+0.21/-0.19) for 2,463 “pure” binaries in the low acceleration regime, consistent with his earlier result γ = 1.43±0.06 for 26,615 wide binaries. The larger numbers make the formal error smaller, hence a formally more significant departure from Newton. Many of these binaries are impure in the sense of being triples with one member being itself a close binary as discussed previously, an effect that has to be modeled in large samples. The point of the smaller samples is to select true binaries so that this modeling is unnecessary. For his smaller pure binary sample, Chae finds a smooth transition from γ ≈ 1 at high acceleration (10-8 m/s/s ≈ 100a0) through γ ≈ 1.11 around 7a0 to γ ≈ 1.49 at local Galactic saturation (1.8 a0).

Banik et al. use a slightly different language. Translating, they find γ = 1 at high confidence (16σ)$ from 8,611 wide binaries with separations from 2,000 to 30,000 AU. Newtonian behavior persists at all scales and accelerations; they find no significant deviations from γ = 1 anywhere. Note that despite going out very far, to 30,000 AU, they do not reach especially low accelerations because the EFE of the Galaxy is effectively constant in the solar neighborhood. There is no getting away from the Galaxy’s 1.8 a0. They also do not reach particularly high, purely Newtonian accelerations: 2,000 AU is in the transition regime where MOND effects are perceptible.

Here is Fig. 11 from Banik et al., the key figure Dr. Banik was advocating in his comments to the previous post:

Fig. 11 of Banik et al. shows the median dimensionless characteristic velocity as a function of dimensionless binary separation for several bins of the data (solid lines). The predicted MOND effect increases with separation until saturating in the Galactic field (dashed lines). This is not seen in most of the data, with only a hint in the highest velocity bin that represents only a few percent of the data.

A flat line in this plot indicates no boost in velocity with diminishing acceleration, so one can clearly see the source of the claim that Newton works better than MOND. There is little indication that the velocity increases at wider separations. Effectively, γ ≈ 1 pretty much everywhere.

Both Chae and Hernandez have pointed out that the lack of a constraint on the high acceleration Newtonian regime is problematic. Orbits are Keplerian in the quasi-Newtonian regime, so the behavior looks Newtonian. Lacking an anchor in the high acceleration regime, it is conceivable that the analysis of Banik is detecting the predicted MOND quasi-Newtonian behavior and defining it to be purely Newtonian. It’s just a modest offset in qualitatively similar behavior. In this context, it is worth noting that Chae and Hernandez independently measure γ ≈ 1 at high acceleration as well as γ ≈ 1.5 at low acceleration: they [claim to] detect the difference between these regimes in a way Banik does not probe.

Now let’s look at the plot that gave me the heebie-jeebies with data on it, Banik et al.’s Fig. 12:

The left two panels of Fig. 12 from Banik et al. showing the probability of observing a particular dimensionless velocity in two bins of radial separation: 2,000 to 3,000 AU (top) and 5,000 to 12,000 AU (bottom). The histograms are the Gaia data. The black lines show the Newtonian prediction while the blue lines show that of MOND. These predictions depend on many things besides the underlying theory, sampling over many astrophysical complications like the distribution of stellar masses, orbital orientations to the line of sight, orbital phase, orbital eccentricity, the close binary fraction, and probably other things that I don’t instantly recall.

One can see the basis of the concern. At high acceleration, the prediction of Newton and MOND are identical. The top bin is the closest we get to that, yet there is a clear difference in the predictions. This bin is in the transition region; there is no bin at sufficiently high acceleration for the predictions to align and provide the self-calibration that both Chae and Hernandez independently exploit.

Looking at the data in the top panel, it clearly agrees better with the Newtonian prediction. I can believe that; what concerns me is the lack grounding at still higher acceleration where the black and blue lines should coincide. I do not have a sufficiently clear understanding of all the machinations (and their inevitable foibles) that go into the predicted lines to trust that this constitutes a definitive test.

Looking at the data in the bottom panel, it clearly agrees better with the prediction of MOND. The histogram of the data follows the blue line of MOND more closely than the Newtonian black line. This is so obvious that I wondered if the colors were wrong – maybe there had been some inadvertent switcheroo in the line color in the plotting code. Apparently not, as this point is addressed in the text of Banik et al.: [in the bottom panel,] “MOND performs somewhat better in a handful of pixels around the peak region…” Yes. Yes it does. That’s… a really weird way of putting it. They go on to say “…though given the uncertainties, the Newtonian model is not that far off.” One could just as well say that about MOND in the top panel. Just looking at this figure, one might conclude that they have detected MOND at large separations.

One of the things that gives me the heebie-jeebies about this figure is that there isn’t much difference in the location of the peaks of the distributions. Some, yes, but not much: Newton and MOND apparently predict very nearly the same typical velocity. Yet that is what Fig. 11 traces: the typical (median) normalized velocity. That only tells a tiny bit of the story that is in Fig. 12, and does not appear to be a particularly sensitive indicator of the effect we’re testing for.

Returning to the matter of statistics, the attentive reader might have noted that I have not said much about the number of binaries included in each analysis. These range from a few hundred to many thousands to tens of thousands. More is better, right?

In this case, I think not. There is always a tension between data quality and quantity. Quantity helps with the statistics, but only so long as there is a signal to be dug out. At some point, it becomes a matter of garbage in, garbage out. In this respect, I am inclined to agree with Ernest Rutherford:

If your experiment requires statistics, you ought to have done a better experiment.

Ernest Rutherford$

I suspect that we’re squeezing the stone of statistics too hard here. When we do this, we get the appearance of a signal when really we’re just grinding metal. I am reminded that any time we do a big experiment like this (dark matter searches are a great example), the first thing we learn about are all the false signals we didn’t anticipate. That happens no matter how well we construct the experiment, and I give Banik et al. credit for planning this out ahead of time. That doesn’t guarantee that everything comes out right on the first attempt. There is just so much junk that the universe can and does throw at us that it is easy to imagine that the samples with large numbers of binaries bring with them too much junk (e.g., false binaries). If the fraction of junk is high, then it will look like junk at all scales – there will be no trend with increasing separation even if there is a signal buried in junk.

Consequently, I am at present inclined to trust more the super-clean sample of Hernandez – the high quality binaries where there is a chance that we’re actually measuring what we want to measure. There are only a few hundred such binaries, so the statistical confidence is modest (2.6σ). I worry that in setting the highly restrictive standards necessary to select the best binaries that we might unintentionally omit objects that could change the answer. But at least there is some confidence that these are real binaries that stands above the accumulation of all the gratuitously enormous amount of junk the universe has to throw our way.

I hope the principal scientists can come to agreement about what the data show and not just wind up having the same argument over and over forever more. That’s what usually happens. I ask all parties to remember that it is important to retain the ability to change one’s mind. All of them have demonstrated the ability to do this previously, and somebody will need to do it again.


*In principle, one might also hope to distinguish between specific theories of MOND. Examples of modified gravity like AQUAL and QUMOND give slightly different predictions. To be able to do this seems… optimistic at this point.

In modified inertia theories, the interpolation function is a chimera that depends on each orbital trajectory, so its effective realization may differ between the nearly circular motions of rotation curves and eccentric wide binaries. If so, the interpolation function defined by external galaxies may not be relevant to the problem. Ultimately, we need a theory that automatically results in the MONDian phenomenology in galaxies. Whatever it is wide binaries are doing should help inform this theory development as well as test alternatives and hopefully exclude some of them.

+The work of Chae has gone through some revisions in response to a referee, but the basic findings are unchanged. Having read an earlier version, I appreciate the clarity provided by the additions: this is a case where the refereeing process was beneficial. (I was not the referee of this or any of these papers. Editors keep me busy enough as it is, thank you very much.)

$Outside of long-established observations like the value of Gauss’s constant, there is no such thing as 16σ confidence in astronomy. The tails of real probability distributions are never as tiny as a pure Gaussian. This is a remarkably naive assertion.

Full speed in reverse!

Full speed in reverse!

People have been asking me about comments in a recent video by Sabine Hossenfelder. I have not watched it, but the quote I’m asked about is “the higher the uncertainty of the data, the better MOND seems to work” with the implication that this might mean that MOND is a systematic artifact of data interpretation. I believe, because they consulted me about it, that the origin of this claim emerged from recent work by Sabine’s student Maria Khelashvili on fitting the SPARC data.

Let me address the point about data interpretation first. Fitting the SPARC data had exactly nothing to do with attracting my attention to MOND. Detailed MOND fits to these data are not particularly important in the overall scheme of these things as I’ll discuss in excruciating detail below. Indeed, these data didn’t even exist until relatively recently.

It may, at this juncture in time, surprise some readers to learn that I was once a strong advocate for cold dark matter. I was, like many of its current advocates, rather derisive of alternatives, the most prominent at the time being baryonic dark matter. What attracted my attention to MOND was that it made a priori predictions that were corroborated, quite unexpectedly, in my data for low surface brightness galaxies. These results were surprising in terms of dark matter then and to this day remain difficult to understand. After a lot of struggle to save dark matter, I realized that the best we could hope to do with dark matter was to contrive a model that reproduced after the fact what MOND had predicted a priori. That can never be satisfactory.

So – I changed my mind. I admitted that I had been wrong to be so completely sure that the solution to the missing mass problem had to be some new form of non-baryonic dark matter. It was not easy to accept this possibility. It required lengthy and tremendous effort to admit that Milgrom had got right something that the rest of us had got wrong. But he had – his predictions came true, so what was I supposed to say? That he was wrong?

Perhaps I am wrong to take MOND seriously? I would love to be able to honestly say it is wrong so I can stop having this argument over and over. I’ve stipulated the conditions whereby I would change my mind to again believe that dark matter is indeed the better option. These conditions have not been met. Few dark matter advocates have answered the challenge to stipulate what could change their minds.

People seem to have become obsessed with making fits to data. That’s great, but it is not fundamental. Making a priori predictions is fundamental, and has nothing to do with fitting data. By construction, the prediction comes before the data. Perhaps this is one way to distinguish between incremental and revolutionary science. Fitting data is incremental science that seeks the best version of an accepted paradigm. Successful predictions are the hallmark of revolutionary science that make one take notice and say, hey, maybe something entirely different is going on.

One of the predictions of MOND is that the RAR should exist. It was not expected in dark matter. As a quick review of the history, here is the RAR as it was known in 2004 and now (as of 2016):

The radial acceleration relation constructed from data available in 2004 and that from 2016.

The big improvement provided by SPARC was a uniform estimate of the stellar mass surface density of galaxies based on Spitzer near-infrared data. These are what are used to construct the x-axis: gbar is what Newton predicts for the observed mass distribution. SPARC was a vast improvement over the optical data we had previously, to the point that the intrinsic scatter is negligibly small: the observed scatter can be attributed to the various uncertainties and the expected scatter in stellar mass-to-light ratios. The latter never goes away, but did turn out to be at the low end of the range we expected. It could easily have looked worse, as it did in 2004, even if the underlying physical relation was perfect.

Negligibly small intrinsic scatter is the best one can hope to find. The issue now is the fit quality to individual galaxies (not just the group plot above). We already know MOND fits rotation curve data. The claim that appears in Dr. Hossenfelder’s video boils down to dark matter providing better fits. This would be important if it told us something about nature. It does not. All it teaches us about is the hazards of fitting data for which the errors are not well behaved.

While SPARC provides a robust estimate of gbar, gobs is based on a heterogeneous set of rotation curves drawn from a literature spanning decades. The error bars on these rotation curves have not been estimated in a uniform way, so we cannot blindly fit the data with our favorite software tool and expect that to teach us something about physical reality. I find myself having to say this to physicists over and over and over and over and over again: you cannot trust astronomical error bars to behave as Gaussian random variables the way one would like and expect in a controlled laboratory setting.

Astronomy is not conducted in a controlled laboratory. It is an observational science. We cannot put the entire universe in a box and control all the variables. We can hope to improve the data and approach this ideal, but right now we’re nowhere near it. These fitting analyses assume that we are.

Screw it. I really am sick of explaining this over and over, so I’m just going to cut & paste verbatim what I told Hossenfelder & Khelashvili by email when they asked. This is not the first time I’ve written an email like this, and I’m sure it won’t be the last.


Excruciating details: what I said to Hossenfelder & Khelashvili about the perils of rotation curve fitting on 22 September 2023 in response for their request for comments on the draft of the relevant paper:

First, the work of Desmond is a good place to look for an opinion independent of mine. 

Second, in my experience, the fit quality you find is what I’ve found before: DM halos with a constant density core consistently give the best fits in terms of chi^2, then MOND, then NFW. The success of cored DM halos happens because it is an extremely flexible fitting function: the core radius and core density can be traded off to fit any dog’s leg, and is highly degenerate with the stellar M*/L. NFW works less well because it has a less flexible shape. But both work because they have more parameters [than MOND].

Third, statistics will not save us here. I once hoped that the BIC would sort this out, but having gone down that road, I believe the BIC does not penalize models sufficiently for adding free parameters. You allude to this at the end of section 3.2. When you go from MOND (with fixed a0 it has only one parameter, M*/L, to fit to account for everything) to a dark matter halo (which has at a minimum 3 parameters: M*/L plus two to describe the halo) then you gain an enormous amount of freedom – the volume of possible parameter space grows enormously. But the BIC just says if you had 20 degrees of freedom before, now you have 22. That does not remotely represent the amount of flexibility that represents: some free parameters are more equal than others. MOND fits and DM halo fits are not the same beast; we can’t compare them this way any more than we can compare apples and snails. 

Worse, to do this right requires that the uncertainties be real random errors. They are not. SPARC provides homogeneous mass models based on near-IR observations of the stellar mass distribution. Those should be OK to the extent that near-IR light == stellar mass. That is a decent mapping, but not perfect. Consequently, we expect the occasional galaxy to misbehave. UGC 128 is a case where the MOND fit was great with optical data then became terrible with near-IR data. The absolute difference in the data are not great, but in terms of the formal chi^2 it is. So is that a failure of the model, or of the data to represent what we want it to represent?

This happens all the time in astronomy. Here, we want to know the circular velocity of a test particle in the gravitational potential predicted by the baryonic mass distribution. We never measure either of those quantities. What we measure is the (i) stellar light distribution and the (ii) Doppler velocities of gas. We assume we can map stellar light to stellar mass and Doppler velocity to orbital speed, but no mass model is perfect, nor is any patch of observed gas guaranteed to be on a purely circular orbit. These are known unknowns: uncertainties that we know are real but we cannot easily quantify. These assumptions that we have to make to do the analysis dominate over the random errors in many cases. We also assume that galaxies are in dynamical equilibrium, but 20% of spirals show gross side-to-side asymmetries, and at least 50% mild ones. So what is the circular motion in those cases? (F579-1 is a good example)

While SPARC is homogeneous in its photometry, it is extremely heterogeneous in its rotation curve measurements. We’re working on fixing that, but it’ll take a while. Consequently, as you note, some galaxies have little constraining power while others appear to have lots. That’s because many of the rotation curve velocity uncertainties are either grossly over or underestimated. To see this, plot the cumulative distribution of chi^2 for any of your models (or see the CDF published by Li et al 2018 for the RAR and Li et al 2020 for dark matter halos of many flavors. So many, I can’t recall how many CDF we published.) Anyway, for a good model, chi^2 is always close to one, so the CDF should go up sharply and reach one quickly – there shouldn’t be many cases with very low chi^2 or very high chi^2. Unfortunately, rotation curve data do not do this for any type of model. There are always way too many cases with chi^2 << 1 and also too many with chi^2 >> 1. One might conclude that all models are unacceptable – or that the error bars are Messed Up. I think the second option is the case. If so, then this sort of analysis will always have the power to mislead. 

I insert Fig. 1 from Li et al. (2020) so you don’t have to go look it up. The CDF of a statistically good model would rise sharply, being an almost vertical line at chi^2 = 1. No model of any flavor does that. That’s in large part because the uncertainties on some rotation curves are too large, while those on others are too small. The greater flexibility of dark matter models make them incrementally better than MOND for the cases with error bars that are too small – hence the corollary statement that “the higher the uncertainty of the data, the better MOND seems to work.” This happens because dark matter models are allowed to chase bogus outliers with tiny error bars in a way that MOND cannot. That doesn’t make dark matter better, it just makes it is easier to fool.

  A key thing to watch out for is the outsized effects of a few points with tiny error bars. Among galaxies with high chi^2, what often happens is that there is one point with a tiny error bar that does not agree with any of the rest of the data for any smoothly continuous rotation curve. Fitting programs penalize a model for missing this point by many sigma, so will do anything they can to make it better. So what happens is that if you let a0 vary with a flat prior, it will got to some very silly values in order to buy a tiny improvement in chi^2. Formally, that’s a better fit, so you say OK, a0 has to vary. But if you plot the fitted RCs with fixed and variable a0, you will be hard pressed to see the difference. Chi^2 is different, sure, but both will have chi^2 >> 1, so a lousy fit either way, and we haven’t really gained anything meaningful from allowing for the greater fitting freedom. Really it is just that one point that is Wrong even though it has a tiny error bar – which you can see relative to the other points, never mind the model. Dark matter halos have more flexibility from the beginning, so this is less obvious for them even though the same thing happens.

So that’s another big point – what is the prior for a dark matter halo? [Your] Table 1 allows V200 and C200 to be pretty much anything. So yes, you will find a fit from that range. For Burkert halos, there is no prior, since these do not emerge from any theory – they’re just a flexible French curve. For NFW halos, there is a prior from cosmology – see McGaugh et al (2007) among a zillion other possible references, including Li et al (2020). In any[L]CDM cosmology, the parameters V200 and C200 correlate – they are not independent. So a reasonable prior would be a Gaussian in log(C200) at a given V200 as specified by some simulation (Macio et al; see Li et al 2020). Another prior is how V200 (or M200) relates to the observed baryonic mass (or stellar mass). This one is pretty dodgy. Originally, we expected a fixed ratio between baryonic and dark mass. So when I did this kind of analysis in the ’90s, I found NFW flunked hard compared to MOND. (I didn’t know about the BIC then.) Galaxy DM halos simply do not look like NFW halos that form in LCDM and host galaxies with a few percent of their mass in the luminous disk even though this was the standard model for many years (Mo, Mao, & White 1998). If we drop the assumption that luminous galaxies are always a fixed fraction of their dark matter halos, then better fits can be obtained. I suspect your uniform prior fits have halo masses all over the place; they probably don’t correlate well with the baryonic mass, nor are their C and V200 parameters likely to correlate as they are predicted to do. You could apply the expected mass-concentration and stellar mass-halo mass relations as priors, then NFW will come off worse in your analysis because you’ve restricted them to where they ought to live.

So, as you say – it all comes down to the prior.

Even applying a stellar mass-halo mass relation from abundance matching isn’t really independent information, though that’s the best you can hope to do. But I was saying 20+ years ago that fixed mass ratios wouldn’t work, but nobody then wanted to abandon that obvious assumption. Since then, they’ve been forced to do so. But there is no good physical reason for it (feedback is the deus ex machina of all problems in the field), what happened is that the data forced us to drop the obvious assumption. Data including kinematic data (McGaugh et al 2010). So adopting a modern stellar mass-halo mass relation will give you a stronger prior than a uniform prior, but that choice has already been informed by the kinematic data that you’re trying to fit. How do we properly penalize the model for cheating about its “prior” by peaking at past data?

So, as you say – it all comes down to the prior. I think it would be important here to better constrain the priors on the DM halo fits. Li et al (2020) discuss this. Even then we’re not done, because galaxy formation modifies the form of the halo function we’re fitting. They shouldn’t end up as NFW even if they start out that way – see Li et al 2022a & b. Those papers consider the inevitable effects of adiabatic compression, but not of feedback. If feedback really has the effects on DM halos that is frequently advertised, then neither NFW or Burkert are appropriate fitting functions – they’re not what LCDM+feedback predicts. Good luck extracting a legitimate prediction from simulations, though. So we’re stuck doing what you’re trying to do: adopt some functional form to represent the DM halo, and see what fits. What you’ve done here agrees with my experience: cored DM halos work best. But they don’t represent an LCDM prediction, or any other broader theory, so – so what? 

Another detail to be wary of – the radial range over which the RC data constrain the DM halo fit is often rather limited compared to the size of the halo. To complicate matters further, the inner regions are often star-dominated, so there is not much of a handle on DM from where the data are best, at least beyond many galaxies preferring not to have a cusp since the stars already get the job done at small R. So, one ends up with V_DM(R) constrained from 3% to 10% of the virial radius, or something like that. V200 and C200 are defined at the notional virial radius, so there are many combinations of these parameters that might adequately fit the observed range while being quite different elsewhere. Even worse, NFW halos are pretty self-similar – there are combinations of (C200,V200) that are highly degenerate, so you can’t really tell the difference between them even with excellent data – the confidence contours look like bananas in C200-V200 space, with low C/high V often being as good as high C/low V. Even even even worse is that the observed V_DM(R) is often approximately a straight line. Any function looks like a straight line if you stretch it out enough. Consequently, the fits to LSB galaxies often tend to absurdly low C and high V200: NFW never looks like a straight line, but it does if you blow it up enough. So one ends up inferring that the halo masses of tiny galaxies are nearly as big as those of huge galaxies, or more so! My favorite example was NGC 3109, a tiny dwarf on the edge of the Local Group. A straight NFW fit suggests that the halo of this one little galaxy weighs more than the entire Local Group, M31 + MW + everything else combined. This is the sort of absurd result that comes from fitting the NFW halo form to a limited radial range of data. 

I don’t know that this helps you much, but you see a few of the concerns. 

Wide binary debate heats up again

Wide binary debate heats up again

One of the most interesting and contentious results concerning MOND this year has been the dynamics of wide binaries. When last I wrote on this topic, way back at the end of August, Chae (2023) and Hernandez (2023) both had new papers finding evidence for MONDian behavior in wide binaries. Since that time, they each have written additional papers on the subject. These independent efforts both report strong evidence for MONDian behavior in wide binaries, so for all of October it seemed like Game Over for conventional* dark matter.

I refrained from writing a post then because I was still waiting to see if there would be a contradictory paper. Now there is. And boy, is it contradictory! Where Hernandez et al. find 2.6σ evidence for non-Newtonian behavior and Chae finds ~5σ evidence for non-Newtonian behavior, both consistent with MOND, Banik et al. find purely Newtonian behavior and claim to exclude MOND at 19σ. That’s pretty high confidence!

Well, which is it, young feller? You got proof of non-Newtonian dynamics, or you want to insist that’s impossible?

After the latest results appeared, a red-hot debate [re]ignited on e-mail, largely along the lines of what was discussed at the conference in St. Andrews. Banik et al say that they can reproduce the MOND-like signal of Chae, but that it goes away when the data quality restriction is applied to physical velocity uncertainties (arguing that this is what you want to know) rather than to raw observational uncertainties. Chae and Hernandez counter that the method Banik et al. apply is not grounded in the Newtonian regime where everyone agrees on what should happen, so they could be calibrating the signal away. This is one thing that I had the impression that everyone had agreed to work on in St. Andrews, but it doesn’t appear that we’re there yet.

Banik et al. do a carefully planned Bayesian analysis. This approach in principle allows one to separate many effects simultaneously, one of which is close binaries (CB**). I look at the impact that close binaries have on the analysis, and it gives me the heebie-jeebies:

One panel from Fig. 10 of Banik et al.

This figure illustrates the probability of measuring a characteristic velocity in MOND for the noted range of projected sky separation. If it is just wide binaries (WB), you get the blue line. If there are some close binaries, the expected distribution changes dramatically. This change is rather larger than the signal expected from the nominal difference in gravity. You can in principle fit for everything simultaneously, but extracting the right small signal when there is a big competing signal can be tricky. Bayesian analyses can help, but they are also a double-sided sledge-hammer: a powerful tool with which to pound the data, but also a tool that can bounce back and smack you in the face. Having done such analyses, and been smacked around a few times (and having seen others get smacked around), looking at this plot really does give me the heebie-jeebies. There are lots of ways in which this can go wrong – or even just overstate the confidence of a correct result.

Everyone uses Bayesian methods these days.***

I expect people are expecting me to comment on this hot mess. Some have already asked me to do so. I really don’t want to. I’ve already said more than I should.

There are very earnest, respectable people doing this work; I don’t think anyone is being intentionally misleading. Somebody must be wrong, but it isn’t my job to sort out who. Moreover, these are long and involved analyses; it will take me time to read all the papers and make sense of them. Maybe once I do, I’ll have something more cogent to say.

I make no promises.


*By conventional dark matter, I mean new particles that only communicate with baryons via gravity.

**CB: In principle, some of the wide binaries detected by Gaia will also be close binaries, in the sense that one of the two widely separated stars is itself not a single star but an unrecognized close binary. We know this happen in nature: the nearest star system, αCentauri, is an example. The main A&B components compose a close binary with Proxima Centauri being widely separated. Modeling how often this happens in the Gaia data gives me the willies.

***To paraphrase Churchill: Many forms of statistics have been tried, and will be tried in this science of sin and woe. No one pretends+ that Bayes is perfect or all-wise. Indeed it has been said that Bayes is the worst form of statistics except for all those other forms that have been tried from time to time.

+Lots of people pretend that Bayes is perfect and all-wise.

How things go mostly right or badly wrong

How things go mostly right or badly wrong

People often ask me of how “perfect” MOND has to be. The short answer is that it agrees with galaxy data as “perfectly” as we can perceive – i.e., the scatter in the credible data is accounted for entirely by known errors and the expected scatter in stellar mass-to-light ratios. Sometimes it nevertheless looks to go badly wrong. That’s often because we need to know both the mass distribution and the kinematics perfectly. Here I’ll use the Milky Way as an example of how easily things can look bad when they aren’t.

First, an update. I had hoped to stop talking about the Milky Way after the recent series of posts. But it is in the news, and there is always more to say. A new realization of the rotation curve from the Gaia DR3 data has appeared, so let’s look at all the DR3 data together:

Gaia DR3 realizations of the Milky Way rotation curve. The most recent version of these data from Poder et al (2023) are shown as blue squares over the range 5 < R < 13 kpc. Other Gaia DR3 realizations include Ou et al. (2023, green circles), Wang et al. (2023, magenta downward pointing triangles), and Zhou et al. (2023, purple triangles).

The new Gaia realization does not go very far out, and has larger uncertainties. That doesn’t mean it is worse; it might simply be more conservative in estimating uncertainties, and not making a claim where the data don’t substantiate it. Neither does that mean the other realizations are wrong: these differences are what happens in different analyses. Indeed, all the independent realizations of the Gaia data are pretty consistent, despite the different stellar selection criteria and analysis techniques. This is especially true for R < 17 kpc where there are lots of stars informing the measurements. Even beyond that, I would say they are consistent at the level we’d expect for astronomy.

Zooming out to compare with other results:

The Milky Way rotation curve. The model line from McGaugh (2018) is shown with data from various sources. The abscissa switches from linear to logarithmic at 10 kpc to wedge it all in. The location of the Large Magellanic Cloud at 50 kpc is noted. Gaia DR3 data (Poder et al., Ou et al., Wang et al., and Zhou et al.) are shown as in the plot above. The small black squares are the Gaia DR2 realization of Eilers et al. (2019) reanalyzed to include the effect of bumps and wiggles by McGaugh (2019). Non-Gaia data include blue horizontal branch stars (light blue squares) and red giants (red squares) in the stellar halo (Bird et al. 2022), globular clusters (Watkins et al. 2019, pink triangles), VVV stars (Portail et al. 2017, dark grey squares at R < 2.2 kpc), and terminal velocities (McClure-Griffiths & Dickey 2007, 2016, light grey points from 3 < R < 8 kpc). These terminal velocities are the only data that inform the model line; everything else follows.

Overall, I would say the data paint a pretty consistent picture. The biggest tension amongst the data illustrated here is between the outermost Gaia points around R = 25 kpc and the corresponding results from halo stars. One is consistent with the model line and the other is not. We shouldn’t allow the model to inform our interpretation; the important point is that the independent data disagree with each other. This happens all the time in astronomy. Sometimes it boils down to different assumptions; sometimes it is a real discrepancy. Either way, one has to learn* to cope.

The sharp-eyed will also notice an apparent tension between the DR2 data (black squares) and DR3 around 6 and 7 kpc. This is not real – it is an artifact of different treatments of the term in the Jeans equation for the logarithmic derivative of the density profile of the tracer particles. That’s a choice made in the analysis. The data are entirely consistent when treated consistently.

Putting on an empiricist’s hat, I will say that the kink in the slope of the Gaia data around R = 18 kpc looks unnatural. That doesn’t happen in other galaxies. Rather than belabor the point further, I’ll simply say that this is how things mostly go right but also a little wrong. This is as good as we can hope for in [extra]galactic astronomy.

In contrast, it is easy to go very wrong. To give an example, here is a model of the Milky Way that was built to approximately match the rotation curve of Sofue (2020).


Fig. 1 from Dai et al. (2022). Note the logarithmic abscissa. Their caption: The rotation curve of the Milky Way. The data (solid dark circles with error bars) for r < 100kpc come from [22], while for r > 100kpc from [23]. The solid, dashed and doted lines describe the contribution from the bulge, stellar disk and dark matter halo respectively, within a ΛCDM model of the galaxy. The dashed-dot line is the total contribution of all three components.The parameters of each component are taken from [24]. For comparison, the Milky way rotation curve from Gaia DR2 is shown in color. The red dots are data from [34], the blue upward-pointing triangles are from [35], while the cyan downward-pointing triangles are from [36].

This realization of the rotation curve is very different from that seen above. Note that the rotation curve (black points) is very different from that of Gaia (red points) over the same radial range. These independent data are inconsistent; at least one of them is wrong. The data extend to very large radii, encompassing not only the LMC but also Andromeda (780 kpc away). I am already concerned about the effects of the LMC at 50 kpc; Andromeda is twice the baryonic mass of the Milky Way so anything beyond 260 kpc is more Andromeda’s territory than ours – depending on which side we’re talking about. The uncertainties are so big out there they provide no constraining power anyway.

In terms of MOND-required perfection, things fall apart for the Dai model already at very small radii. Dai et al. (2022) chose to fit their bulge component to the high amplitude terminal velocities of Sofue. That’s a reasonable thing to do, if we think the terminal velocities represent circular motion. Because of the non-circular motions that sustain the Galactic bar, they almost certainly do not – that’s why I restricted use of terminal velocities to larger radii. We also know something about the light distribution:

The inner 3 kpc of the Milky Way. The circles are the terminal velocities of Sofue (2020); the squares are the equivalent circular velocity of the potential reconstructed from the kinematics of stars in the VVV survey (Portail et al. 2017). The line is the bulge-bar model of McGaugh (2008) based on the light distribution reported by Binney et al (1997).

This is essentially the same graph as I showed before, but showing only the Newtonian bulge-bar component, and on a logarithmic abscissa for comparison with the plot of Dai et al. The two bulge models are very different. That of Dai et al. is more massive and more compact, as required to match the terminal velocities. There may be galaxies out there that look like this, but the Milky Way is not one of them.

Indeed, Newton’s prediction for the rotation curve of the bulge-bar component – the line labeled bulge/bar based on what the Milky Way looks like – is in good agreement with the effective circular speed curve obtained from stellar data. It is not consistent with the terminal velocities. We could increase the amplitude of the Newtonian prediction by increasing the mass-to-light ratio of the stars (I have adopted the value I expect for stellar populations), but the shape would still be wrong. This does not come as a surprise to most Galactic astronomers, because we know there is a bar in the center of the Milky Way and we know that bars induce non-circular motions, so we do not expect the terminal velocities to be a fair tracer of the rotation curve in this region. That’s why Portail et al. had to go to great lengths in their analysis to reconstruct the equivalent circular velocity, as did I just to build the bulge-bar model.

The thing about predicting rotation curves from the observed mass, as MOND does, is that you have to get both the kinematic data and the mass distribution right. The velocity predicted at any radius depends on the mass enclosed by that radius. So if we get the bulge badly wrong, everything spirals down the drain from there.

Dai et al. (2022) compare their model to the acceleration residuals predicted by MOND for their mass model. If all is well, the data should scatter around the constant line at zero in this graph:

Fig. 4 from Dai et al. (2022). Their caption: [The radial acceleration relation] recast as a comparison between the total acceleration, a, and the MOND prediction, aM , as a function of the acceleration due to baryons aB. The solid horizontal line is a = aM. The circles and squares with error bars represent the Milky Way and M31 data, while the gray dots are from the EAGLE simulation of ΛCDM in [1]. For aB > 10−10m/s2 any difference between a and aM is unclear. However, once aB drops well below 10−11m/s2, the discrepancy emerges. The short-dashed line is the ΛCDM fitting curve of the MW. The dash-dot line is the ΛCDM fitting curve of M31. The mass range** of galaxies in EAGLE’s data is chosen to be between 5 × 1010M to 5 × 1011M. For comparison, the Milky way rotation curve from GAIA data release II is shown in color. The red dots are data from [34], the blue triangles are from [35], while the cyan down triangles are from [36]. While the EAGLE simulation does not match the data perfectly, these plots indicate that it is much easier to accommodate a systematic downward trend with the ΛCDM model than with MOND.

Things are not well.

The interpretation that is offered (right in the figure caption) is that MOND is wrong and the LCDM-based EAGLE simulation does a better if not perfect job of explaining things. We already know that’s not right. The alternate interpretation is that this is not a valid representation of the prediction of MOND, because their mass model does not follow from the observed distribution of light. They get neither the baryonic mass distribution and its predicted acceleration ab nor the total acceleration a right in the plot above.

In terms of dark matter, the model of Dai et al. may appear viable. In terms of MOND, it is way off, not just a little off. The residuals are only zero, as they should be, for a narrow range of accelerations, 2 to 3 x 10-10 m/s/s. That’s more Newton than MOND, and appears to correspond to the limited range in radii over which their model matches the rotation curve data in their Fig. 1 (roughly 4 to 6 kpc). It doesn’t really fit the data elsewhere, and the restrictions on a MOND fit are considerably more stringent than on the sort of dark matter model they construct: there’s no reason to expect their model to behave like MOND in the first place.

And, hoo boy, does it ever not behave like MOND. Look at how far those red points – the Gaia DR2 data – deviate from zero in their Fig. 4. Those are the exact same data that agree well with the model line I show above – the data that were correctly predicted in advance. This model is a reasonable representation of the radial force predicted by MOND, with the blue line in my plot being equivalent to the zero line in theirs.

This is how things can go badly wrong. To properly apply MOND, we need to measure both the kinematics and baryonic mass distribution correctly. If we screw either up, as is easy to do in astronomy, then the result will look very wrong, even if it shouldn’t. Combine this with the eagerness many people have to dismiss MOND outright, and you wind up with lots of articles claiming that MOND is wrong – even when that’s not really the story the data tell. Happens over and over again, so the field remains stagnant.


*This is a large part of the cultural difference between physics and astronomy. Physicists are spoiled by laboratory experiments done in controlled conditions in which one can measure to the sixth place of decimals. In contrast, astronomy is an observational rather than experimental science. We can’t put the universe in a box and control all the systematics – measuring most quantities to 1% is a tall order. Consequently, astronomers are used to being wrong. While I wouldn’t say that astronomers cope with it gracefully, they’re well aware that it happens, that is has happened a lot historically, and will continue to happen in the future. It is a risk we all take in trying to understand a universe so much vaster than ourselves. This makes astronomers rather more tolerant of surprising results – results where the first response is “that can’t be right!” but also informed by the experience that “we’ve been wrong before!” Physicists coming to the field generally lack this experience and take the error bars way too seriously. I notice this attitude is creeping into the younger generation of astronomers; people who’ve received their data from distant observatories and performed CPU-intensive MCMC error analyses, so want to believe them, but often lack the experience of dozens of nights spent at the observatory sweating a thousand ill-controlled but consequential details, like walking out to a beautiful sunrise decorated by wisps of cirrus clouds. When did those arrive?!?


**The data that define the radial acceleration relation come from galaxies spanning six decades in stellar mass, so this one decade range from the simulations is tiny – it is literally comparing a factor of ten to a a factor of a million. What happens outside the illustrated mass range? Are lower masses even resolved?

A Response to Recent Developments Concerning the Gravitational Potential of the Milky Way

A Response to Recent Developments Concerning the Gravitational Potential of the Milky Way

In the series of recent posts I’ve made about the Milky Way, I missed an important reply made in the comments by Francois Hammer, one of the eminent scientists doing the work. I was on to writing the next post when he wrote it, and simply didn’t see it until yesterday. Dr. Hammer has some important things to say that are both illustrative of the specific topic and also of how science should work. I wanted to highlight his concerns with their own post, so, with his permission, I cut & paste his comments below, making this, in effect, a guest post by Francois Hammer.


There are two aspects we’d like to mention, as they may help to clarify part of the debate:
1- When saying “Gaia is great, but has its limits. It is really optimized for nearby stars (within a few kpc). Outside of that, the statistics… leave something to be desired. Is it safe to push out beyond 20 kpc?”, one may wonder whether the significance of Gaia data has been really understood.
In the Eilers et al. 2019 DR2 rotation curve, you may see points with small error bar up to 21-22 kpc. Gaia DR3 provides proper motion (systematics) uncertainties that are 2 times smaller than from Gaia DR2, so it can easily goes to 25 kpc or more.
The gain in quality for parallaxes is indeed smaller (30% gain). However, our results cannot be affected by distance estimates, since the large number of stars with parallax estimates in Wang et al. (2023) is giving the same rotation curve than that from (a lower number of) RGB stars with spectrophotometric distances (Ou et al. 2023), i.e., following Eilers et al. 2019. And both show a Keplerian decline, which was already noticeable with DR2 results from Eilers et al 2019. The latter authors said in their conclusions: “We do see a mild but significant deviation from the straightly declining circular velocity curve at R≈19–21 kpc of Δv≈15 km s−1.” Our work using Gaia DR3 is nothing else than having a factor 2 better in accounting for systematics, and then being able to resolve what looks like a Keplerian decrease of the rotation curve.
We may also mention here that one of us participated to an unprecedented study of the kinematics LMC (Gaia Collaboration 2021, Luri’s paper), which is at 50 kpc. Unless one proves everything that people has done about the LMC and MW is wrong, and that the data are too uncertain to conclude anything about what happens at R=17-25 kpc, the above clarifications about Gaia accuracy are truly necessary for people reading your blog.
2- The argument that the result “violates a gazillion well-established constraints.” has to be taken with some caution, since otherwise, no one can do any progress in the field. In fact, the problem with many probes (so-called “satellites”) in the MW halo, is the fact that one cannot guarantee whether or not their orbits are at equilibrium with the MW potential. This is the reverse for the MW disk, for which stars are rotating in the disk, and, e.g., at 25 kpc, they have likely experience 7-8 orbits since the last merger (Gaia-Sausage-Enceladus), about 9 billion years go. In other words, the mass provided by a system mostly at equilibrium, likely supersedes masses provided by systems that equilibrium conditions are not secured. An interesting example of this is given by globular clusters (GCs). If taken as an ensemble of 156 GCs (from Baumgardt catalog), just by removing Pyxis and Terzan 8, the MW mass inside 50 kpc passes from 5.5 to 2.1 10^11 Msun. This is likely because these two GCs may have come quite recently, meaning that their initial kinetic energy is still contributing to their total energy. A similar mass overestimate could happen if one accounts the LMC or Leo I as MW satellites at equilibrium with the MW potential.
So we agree that near 25 kpc the disk of the MW may show signs of less-equilibrium, or sign of slightly less circular orbits due to different phenomenas discussed in the blog. However, why taking into account objects for which there is no proof they are at equilibrium as being the true measurements?
In our work, we have considerably focused in understanding and expanding the whole contribution of systematics, which may comes from Gaia data, but also from assumptions about stellar profile (i.e., deviations from exponential profiles), from the Sun distance and proper motion and so on. You may find a description in Ou et al.’s Figure 5 and Jiao et al.’s Figure 4, both showing that systematics cannot gives much more than 10% error on circular velocity estimates. This is an area where we are considered by the Local Group community as being quite conservative, and following Gaia specialists with who we have worked to deliver the EDR3 catalog of dwarf galaxy motions (Li, Hammer, Babusiaux et al 2021) up to about 150 kpc. Jiao et al. paper main contribution is the fair accounting of systematics, which analysis shows error bars that are much larger than those from other sources of errors especially in MW outskirts (see Fig. 2).

Francois Hammer, 24 September 2023

The image at top is Fig. 2 from Jiao et al. illustrating their assessment of the rotation curve and its systematic uncertainties.

OSIRIS-REx returns safely

OSIRIS-REx returns safely

Taking a break from galaxies and cosmology, I’d like to post a little praise of NASA for safely returning a piece of an asteroid to Earth.

One of the amazing things to me about astronomy & astrophysics is that we have learned how to decipher the composition of distant stars and gas clouds by observing their spectra. I worked on this early in my career and retain an interest in the cosmic abundance of the elements. Fun fact: though often overlooked because it is a boring noble gas that doesn’t bind chemically into any common molecules or minerals, neon is number 5 on the list of most common elements, which goes hydrogen, helium, oxygen, carbon, neon. Nitrogen is number 6 by number, but iron supplants it if we weight by mass – there are more nitrogen atoms by number in the sun but the iron weighs more because of the greater mass of each atom. The order of the first five remains the same by either accounting.

Amazing as it is that we can do this, it can only be accomplished by passive observation. What we’d really like to do is get samples of the remote universe to analyze in the laboratory where precision is much higher and we can better control for systematic effects. Of course we can’t travel to stars and nebulae that are many light-years distant, let alone return from there. But we can do it within the solar system, which is amazing enough. The Apollo astronauts brought back rocks from the moon that helped determine the age of the solar system (4.568 billion years, give or take a million), and the period of “late heavy bombardment” when most big lunar craters were formed – a mere 3.9 billion years ago. This in turn calibrates crater densities; counting craters on other solar system bodies lets us gauge the age of a surface. Lots of craters means old; few craters means something interesting had to happen to cover up all the craters that formed during heavy bombardment. It’s not like all those early meteoroids were dodging the Earth while hammering the moon; it’s just that the Earth has covered it up since.

One of the most interesting things scientifically are samples of pristine material – the stuff from which the solar system formed. The Earth is a remarkably active planet geologically, which means that its rocks are always getting remade by erosion, subduction, and volcanism. They’re about as far from pristine as a rock can get. The closest we expect we can get are the comets and asteroids orbiting safely away from the big planets that have a complex history of their own.

Hence the idea for a mission that could return a sample from a remote asteroid. This is what OSIRIS-REx has now accomplished. It is worth pausing to reflect what an amazing feat this is.

We’ve only had the capacity to launch things beyond the atmosphere of our planet for 66 years. Though satellite launches are now relatively common, the most frequent destination is low earth orbit. That’s only a couple thousand kilometers, which is about a third of an Earth radius, so still pretty close. It is a distance that planes traverse horizontally all the time, if only at an altitude of 10 km or so. It’s just not that far on an interplanetary scale.

Deep space missions that leave Earth’s gravity well are harder and much less common. Those that go out to an asteroid, grab a piece, and return are even harder. It’s one thing to shoot something off a rocket so hard it never comes back. It’s quite another to do that and then turn around and come back at a time and place of our choosing. That’s a remarkable feat of celestial navigation and rocket engineering. Oh, and pause on the way to graze an asteroid, grab a sample, and store it for safe return.

Safe is key here. If one wants a pristine sample of the early solar system, you not only need to go to deep space to collect it, but you have to keep it safe through the rigors of reentry, collect it, and get it to your lab unsullied by terrestrial contaminants. Lots that can go wrong. The spacecraft has to endure the heat of reentry, suffer no leaks, and land gently in a spot where the sample can be retrieved. This all went well for OSIRIS-REx. It doesn’t always work so well.

Genesis was another sample return mission. Launched in 2001, it collected particles from the solar wind – a good way to get a measure of the composition of the sun. It did this for several years before returning 19 years ago to the month, to the same landing area as OSIRIS-REx. As it happened, I had just flown to Tucson to observe at Kitt Peak, and found myself having breakfast in the La Quinta next to the airport before renting a car to drive up the mountain. The landing was on the TV there, so it was breakfast and a show.

Only the show didn’t go so well. A helicopter was supposed to snag the capsule as it drifted at the end of its parachute to ensure no contamination from the ground. Through some amazing camera work, they showed a fairly zoomed-in image of the return capsule as it hurtled from the sky. Spinning, spinning, spinning… it looked out of control. Shouldn’t the parachute have deployed by now? Maybe not – that’s often done at fairly low altitude where the air is thick enough to bite. So I watched, spinning, spinning, as seconds stretched into minutes, spinning, spinning, surely the parachute will deploy any moment now, spinning, spinning, any moment now, spinning, spinning, really, any moment now, spinning, spinning, SMACK! into the ground.

Genesis did not experience a gentle landing. Photo credit: USAF, public domain.

The parachute failed to deploy. Apparently Lockheed Martin installed it backwards, a mistake for which I’m sure they were well remunerated. This is but one of the hazards of space travel.

So it was with a little trepidation that I watched the return of OSIRIS-REx this morning. There was again some amazing camera work. First we saw the blaze of reentry, then after that faded the capsule itself emerged, becoming visible while still at high altitude. Spinning, spinning.

As I was watching on NASA TV, it was announced that the order to deploy the parachute had been issued. Spinning, spinning. Good. Spinning, spinning. No parachute. Was there a time delay on that order? Still seemed high to be deploying a chute, but it was hard to judge the altitude from watching a small spinning blob on TV. Spinning, spinning. I am old and jaded, so I didn’t feel nervous – yet. Spinning, spinning. Only a tiny bit of anxiety. Spinning, spinning. Then it was announced that the parachute was scheduled to deploy at 49 minutes past the hour – still two minutes away. Spinning, spinning. Then, at 48 minutes past the hour, the parachute deployed. I was so enthused to see it that I didn’t worry that it had come a bit early – better than too late! Apparently it deployed at an altitude of 20,000 feet when it wasn’t supposed to deploy until 5,000. So that went wrong, but only a tiny bit wrong – it came gently to rest on the ground near the edge of the target ellipse – i.e., within the error bars.

Osiris-Rex return capsule where it landed in Utah. Screen shot from NASA TV. As in, I took a picture of the TV with my phone.

This time there was no unnecessarily elaborate plan to snag the capsule out of the air with a helicopter as there had been for Genesis. But a helicopter was used to transport the capsule, dangled from the end of a long rope, to a temporary clean room that had been set up nearby. From there it will be transported to the Astromaterials facility at the Johnson Space Center in Houston, where they have an office of Astromaterials Acquisition and Curation. Sounds very Indiana Jones in space.

Science to follow.

Recent Developments Concerning the Gravitational Potential of the Milky Way. III. A Closer Look at the RAR Model

Recent Developments Concerning the Gravitational Potential of the Milky Way. III. A Closer Look at the RAR Model

I am primarily an extragalactic astronomer – someone who studies galaxies outside our own. Our home Galaxy is a subject in its own right. Naturally, I became curious how the Milky Way appeared in the light of the systematic behaviors we have learned from external galaxies. I first wrote a paper about it in 2008; in the process I realized that I could use the RAR to infer the distribution of stellar mass from the terminal velocities observed in interstellar gas. That’s not necessary in external galaxies, where we can measure the light distribution, but we don’t get a view of the whole Galaxy from our location within it. Still, it wasn’t my field, so it wasn’t until 2015/16 that I did the exercise in detail. Shortly after that, the folks who study the supermassive black hole at the center of the Galaxy provided a very precise constraint on the distance there. That was the one big systematic uncertainty in my own work up to that point, but I had guessed well enough, so it didn’t make a big change. Still, I updated the model to the new distance in 2018, and provided its details on my model page so anyone could use it. Then Gaia data started to pour in, which was overwhelming, but I found I really didn’t need to do any updating: the second data release indicated a declining rotation curve at exactly the rate the model predicted: -1.7 km/s/kpc. So far so good.

I call it the RAR model because it only involves the radial force. All I did was assume that the Milky Way was a typical spiral galaxy that followed the RAR, and ask what the mass distribution of the stars needed to be to match the observed terminal velocities. This is a purely empirical exercise that should work regardless of the underlying cause of the RAR, be it MOND or something else. Of course, MOND is the only theory that explicitly predicted the RAR ahead of time, but we’ve gone to great lengths to establish that the RAR is present empirically whether we know about MOND or not. If we accept that the cause of the RAR is MOND, which is the natural interpretation, then MOND over-predicts the vertical motions by a bit. That may be an important clue, either into how MOND works (it doesn’t necessarily follow the most naive assumption) or how something else might cause the observed MONDian phenomenology, or it could just be another systematic uncertainty of the sort that always plagues astronomy. Here I will focus on the RAR model, highlighting specific radial ranges where the details of the RAR model provide insight that can’t be obtained in other ways.

The RAR Milky Way model was fit to the terminal velocity data (in grey) over the radial range 3 < R < 8 kpc. Everything outside of that range is a prediction. It is not a prediction limited to that skinny blue line, as I have to extrapolate the mass distribution of the Milky Way to arbitrarily large radii. If there is a gradient in the mass-to-light ratio, or even if I guess a little wrong in the extrapolation, it’ll go off at some point. It shouldn’t be far off, as V(R) is mostly fixed by the enclosed mass. Mostly. If there is something else out there, it’ll be higher (like the cyan line including an estimate of the coronal gas in the plot that goes out to 130 kpc). If there is a bit less than the extrapolation, it’ll be lower.

The RAR model Milky Way (blue line) together with the terminal velocities to which it was fit (light grey points), VVV data in the inner 2.2 kpc (dark grey squares), and the Zhou et al. (2023) realization of the Gaia DR3 data. Also shown are the number of stars per bin from Gaia (right axis).

From 8 to 19 kpc, the Gaia data as realized by Zhao et al. fall bang on the model. They evince exactly the slowly declining rotation curve that was predicted. That’s pretty good for an extrapolation from R < 8 kpc. I’m not aware of any other model that did this well in advance of the observation. Indeed, I can’t think of a way to even make a prediction with a dark matter model. I’ve tried this – a lot – and it is as easy to come up with a model whose rotation curve is rising as one that is falling. There’s nothing in the dark matter paradigm that is predictive at this level of detail.

Beyond R > 19 kpc, the match of the model and Zhou et al. realization of the data is not perfect. It is still pretty damn good by astronomical standards, and better than the Keplerian dotted line. Cosmologists would be wetting themselves with excitement if they could come this close to predicting anything. Heck, they’re known to do that even when they’re obviously wrong*.

If the difference between the outermost data and the blue line is correct, then all it means is that we have to tweak the model to have a bit less mass than assumed in the extrapolation. I call it a tweak because it would be exactly that: a small change to an assumption I was obliged to make in order to do the calculation. I could have assumed something else, and almost did: there is discussion in the literature that the disk of the Milky Way is truncated at 20 kpc. I considered using a mass model with such a feature, but one can’t make it a sharp edge as that introduces numerical artifacts when solving the Poisson equation numerically, as this procedure depends on derivatives that blow up when they encounter sharp features. Presumably the physical truncation isn’t unphysically sharp anyway, rather being a transition to a steeper exponential decline as we sometimes see in other galaxies. However, despite indications of such an effect, there wasn’t enough data to constrain it in a way useful for my model. So rather than introduce a bunch of extra, unconstrained freedom into the model, I made a straight extrapolation from what I had all the way to infinity in the full knowledge that this had to be wrong at some level. Perhaps we’ve found that level.

That said, I’m happy with the agreement of the data with the model as is. The data become very sparse where there is even a hint of disagreement. Where there are thousands of stars per bin in the well-fit portion of the rotation curve, there are only tens per bin outside 20 kpc. When the numbers get that small, one has to start to worry that there are not enough independent samples of phase space. A sizeable fraction of those tens of stars could be part of the same stellar stream, which would bias the results to that particular unrepresentative orbit. I don’t know if that’s the case, which is the point: it is just one of the many potential systematic uncertainties that are not represented in the formal error bars. Missing those last five points by two sigma is as likely to be an indication that the error bars have been underestimated as it is to be an indication that the model is inadequate. Trying to account for this sort of thing is why the error bars of Jiao et al. are so much bigger than the formal uncertainties in the three realization papers.

That’s the outer regions. The place where the RAR model disagrees the most with the Gaia data is from 5 < R < 8 kpc, which is in the range where it was fit! So what’s going on there?

Again, the data disagree with the data. The stellar data from Gaia disagree with the terminal velocity data from interstellar gas at high significance. The RAR model was fit to the latter, so it must per force disagree with the former. It is tempting to dismiss one or the other as wrong, but do they really disagree?

Adapted from Fig. 4 of McGaugh (2019). Grey points are the first and fourth quadrant terminal velocity data to which the model (blue line) was matched. The red squares are the stellar rotation curve estimated with Gaia DR2 (DR3 is indistinguishable). The black squares are the stellar rotation curve after adjustment to be consistent with a mass profile that includes spiral arms. This adjustment for self-consistency remedies the apparent discrepancy between gas and stellar data.

In order to build the model depicted above, I chose to split the difference between the first and fourth quadrant terminal velocity data. I fit them separately in McGaugh (2016) where I made the additional point that the apparent difference between the two quadrants is what we expect from an m=2 mode – i.e., a galaxy with spiral arms. That means these velocities are not exactly circular as commonly assumed, and as I must per force assume to build the model. So I split the difference above in the full knowledge that this is not the exact circular velocity curve of the Galaxy, it’s just the best I can do at present. This is another example of the systematic uncertainties we encounter: the difference between the first and fourth quadrant is real and is telling us that the galaxy is not azimuthally symmetric – as anyone can tell by looking at any spiral galaxy, but is a detail we’d like to ignore so we can talk about disk+dark matter halo models in the convenient limit of axisymmetry.

Though not perfect – no model is – the RAR model Milky Way is a lot better than models that ignore spiral structure entirely, which is basically all of them. The standard procedure assumes an exponential disk and some form of dark matter halo. Allowance is usually made for a central bulge component, but it is relatively rare to bother to include the interstellar gas, much less consider deviations from a pure exponential disk. Having adopted the approximation of an exponential disk, one inevitably get a smooth rotation curve like the dashed line below:

Fig. 1 from McGaugh (2019). Red points are the binned fourth quadrant molecular hydrogen terminal velocities to which the model (blue line) has been fit. The dotted lines shows the corresponding Newtonian rotation curve of the baryons. The dashed line is the model of Bovy & Rix (2013) built assuming an exponential disk. The inset shows residuals of the models from the data. The exponential model does not and cannot fit these data.

The common assumption of exponential disk precludes the possibility of fitting the bumps and wiggles observed in the terminal velocities. These occur because of deviations from a pure exponential profile caused by features like spiral arms. By making this assumption, the variations in mass due to spiral arms is artificially smoothed over. They are not there by assumption, and there is no way to recover them in a dark matter fit that doesn’t know about the RAR.

Depending on what one is trying to accomplish, an exponential model may suffice. The Bovy & Rix model shown above is perfectly reasonable for what they were trying to do, which involved the vertical motions of stars, not the bumps and wiggles in the rotation curve. I would say that the result they obtain is in reasonable agreement with the rotation curve, given what they were doing and in full knowledge that we can’t expect to hit every error bar of every datum of every sort. But for the benefit of the chi-square enthusiasts who are concerned about missing a few data points at large radii, the reduced chi-squared of the Bovy & Rix model is 14.35 while that of the RAR model is 0.6. A good fit is around 1, so the RAR model is a good fit while the smooth exponential is terrible – as one can see by eye in the residual inset: the smooth exponential model gets the overall amplitude about right, but hits none of the data. That’s the starting point for every dark matter model that assumes an exponential disk; even if they do a marginally better job of fitting the alleged Keplerian downturn, they’re still a lot worse if we consider the terminal velocity data, the details of which are usually ignored.

If instead we pay attention the details of the terminal velocity data, we discover that the broad features seen there in are pretty much what we expect for the kinematic signatures of photometrically known spiral arms. That is, the mass density variations inferred by fitting the RAR correspond to spiral arms that are independently known from star counts. We’ve discussed this before.

Spiral structure in the Milky Way (left) as traced by HII regions and Giant Molecular Clouds (GMCs). These correspond to bumps in the surface density profile inferred from kinematics with the RAR (right).

If we accept that the bumps and wiggles in the terminal velocities are tracers of bumps and wiggles in the stellar mass profiles, as seen in external galaxies, then we can return to examining the apparent discrepancy between them and the stellar rotation curve from Gaia. The latter follow from an application of the Jeans equation, which helps us sort out the circular motion from the mildly eccentric orbits of many stars. It includes a term that depends on the gradient of the density profile of the stars that trace the gravitational potential. If we assume an exponential disk, then that term is easily calculated. It is slowly and smoothly varying, and has little impact on the outcome. One can explore variations of the assumed scale length of the disk, and these likewise have little impact, leading us to infer that we don’t need to worry about it. The trouble with this inference is that it is predicated on the assumption of a smooth exponential disk. We are implicitly assuming that there are no bumps and wiggles.

The bumps and wiggles are explicitly part of the RAR model. Consequently, the gradient term in the Jeans equation has a modest but important impact on the result. Applying it to the Gaia data, I get the black points:

The red squares are the Gaia DR2 data. The black squares are the same data after including in the Jeans equation the effect of variations in the tracer gradient. This term dominates the uncertainties.

The velocities of the Gaia data in the range illustrated all go up. This systematic effect reconciles the apparent discrepancy between the stellar and gas rotation curves. The red points are highly discrepant from the gray points, but the black points are not. All it took was to drop the assumption of a smooth exponential profile and calculate the density gradient numerically from the data. This difference has a more pronounced impact on rotation curve fits than any of the differences between the various realizations of the Gaia DR3 data – hence my cavalier attitude towards their error bars. Those are not the important uncertainties.

Indeed, I caution that we still don’t know what the effective circular velocity of the potential is. I’ve made my best guess by splitting the difference between the first and fourth quadrant terminal velocity data, but I’ve surely not got it perfectly right. One might view the difference between the quadrants as the level at which the perfect quantity is practically unknowable. I don’t think it is quite that bad, but I hope I have at least given the reader some flavor for some of the hidden systematic uncertainties that we struggle with in astronomy.

It gets worse! At small radii, there is good reason to be wary of the extent to which terminal velocities represent circular motion. Our Galaxy hosts a strong bar, as artistically depicted here:

Artist’s rendition of the Milky Way. Image credit: NASA/JPL-Caltech.

Bars are a rich topic in their own right. They are supported by non-circular orbits that maintain their pattern. Consequently, one does not expect gas in the region where the bar is to be on circular orbits. It is not entirely clear how long the bar in our Galaxy is, but it is at least 3 kpc – which is why I have not attempted to fit data interior to that. I do, however, have to account for the mass in that region. So I built a model based on the observed light distribution. It’s a nifty bit of math to work out the equivalent circular velocity corresponding to a triaxial bar structure, so having done it once I’ve not been keen to do it again. This fixes the shape of the rotation curve in the inner region, though the amplitude may shift up and down with the mass-to-light ratio of the stars, which dominate the gravitational potential at small radii. This deserves its own close up:

Colored points are terminal velocities from Marasco et al. (2017), from both molecular (red) and atomic (green) gas. Light gray circles are from Sofue (2020). These are plotted assuming they represent circular motions, which they do not. Dark grey squares are the equivalent circular velocity inferred from stars in the VVV survey. The black line is the Newtonian mass model for the central bar and disk, and the blue line is the corresponding RAR model as seen above.

Here is another place where the terminal velocities disagree with the stellar data. This time, it is because the terminal velocities do not trace circular motion. If we assume they do, then we get what is depicted above, and for many years, that was thought to be the Galactic rotation curve, complete with a pronounced classical bulge. Many decades later, we know the center of the Galaxy is not dominated by a bulge but rather a bar, with concominant non-circular motions – motions that have been observed in the stars and carefully used to reconstruct the equivalent circular velocity curve by Portail et al. (2017). This is exactly what we need to compare to the RAR model.

Note that 2008, when the bar model was constructed, predates 2017 (or the 2016 appearance of the preprint). While it would have been fair to tweak the model as the data improved, this did not prove necessary. The RAR model effectively predicted the inner rotation curve a priori. That’s a considerably more impressive feat than getting the outer slope right, but the model manages both sans effort.

No dark matter model can make an equivalent boast. Indeed, it is not obvious how to do this at all; usually people just make a crude assumption with some convenient approximation like the Hernquist potential and call it a day without bothering to fit the inner data. The obvious prediction for a dark matter model overshoots the inner rotation curve, as there is no room for the cusp predicted in cold dark matter halos – stars dominate the central potential. One can of course invoke feedback to fix this, but it is a post hoc kludge rather than a prediction, and one that isn’t supposed to apply in galaxies as massive as the Milky Way. Unless it needs to, of course.

So, lets’s see – the RAR model Milky Way reconciles the tension between stellar and interstellar velocity data, indicates density bumps that are in the right location to correspond to actual spiral arms, matches the effective circular velocity curve determined for stars in the Galactic bar, correctly predicted the slope of the rotation curve outside the solar circle out to at least 19 kpc, and is consistent with the bulk of the data at much larger radii. That’s a pretty successful model. Some realizations of the Gaia DR3 data are a bit lower than predicted, but others are not. Hopefully our knowledge of the outer rotation curve will continue to improve. Maybe the day will come when the data have improved to the point where the model needs to be tweaked a little bit, but it is not this day.


*To give one example, the BICEP II experiment infamously claimed in March of 2014 to have detected the Inflationary signal of primordial gravitational waves in their polarization data. They held a huge press conference to announce the result in clear anticipation of earning a Nobel prize. They did this before releasing the science paper, much less hearing back from a referee. When they did release the science paper, it was immediately obvious on inspection that they had incorrectly estimated the dust foreground. Their signal was just that – excess foreground emission. I could see that in a quick glance at the relevant figure as soon as the paper was made available. Literally – I picked it up, scanned through it, saw the relevant figure, and could immediately spot where they had gone wrong. And yet this huge group of scientists all signed their name to the submitted paper and hyped it as the cosmic “discovery of the century”. Pfft.

Recent Developments Concerning the Gravitational Potential of the Milky Way. II. A Closer Look at the Data

Recent Developments Concerning the Gravitational Potential of the Milky Way. II. A Closer Look at the Data

Continuing from last time, let’s compare recent rotation curve determinations from Gaia DR3:

Fig. 1 from Jiao et al. comparing three different realizations of the Galactic rotation curve from Gaia DR3. The vertical lines* mark the range of the Ou et al. data considered by Chan & Chung Law (2023).

These are different analyses of the same dataset. The Gaia data release is immense, with billions of stars. There are gazillions of ways to parse these data. So it is reasonable to have multiple realizations, and we shouldn’t expect them to necessarily agree perfectly: do we look exclusively at K giants? A stars? Only stars with proper motion and/or parallax data more accurate than some limit? etc. Of course we want to understand any differences, but that’s not going to happen here.

My first observation is that the various analyses are broadly consistent. They all show a steady decline over a large range of radii. Nothing shocking there; it is fairly typical for bright, compact galaxies like the Milky Way to have somewhat declining rotation curves. The issue here, of course, is how much, and what does it mean?

Looking more closely, not all of the data agree with each other, or even with themselves. There are offsets between the three at radii around the sun (we live just outside R = 8 kpc) where you’d naively think they would agree the best. They’re very consistent from 13 < R < 17 kpc, then they start to diverge a little. The Ou data have a curious uptick right around R = 17 kpc, which I wouldn’t put much stock in; weird kinks like that sometimes happen in astronomical data. But it can’t be consistent with a continuous mass distribution, and will come up again for other reasons.

As an astronomer, I’m happy with the level of agreement I see here. It is not perfect, in the sense that there are some points from one data set whose error bars do not overlap with those of other data sets in places. That’s normal in astronomy, and one of the reasons that we can never entirely trust the stated uncertainties. Jiao et al. make a thorough and yet still incomplete assessment of the systematic uncertainties, winding up with larger error bars on the Wang et al. realization of the data.

For example, one – just one of the issues we have to contend with – is the distance to each star in the sample. Distances to individual objects are hard, and subject to systematic uncertainties. The reason to choose A stars or K giants is because you think you know their luminosity, so can estimate their distance. That works, but aren’t necessarily consistent (let alone correct) among the different groups. That by itself could be the source of the modest difference we see between data sets.

Chan & Chung Law use the Ou et al. realization of the data to make some strong claims. One is that the gradient of the rotation curve is -5 km/s/kpc, and this excludes MOND at high confidence. Here is their plot.

You will notice that, as they say, these are the data of Ou et al, being identical to the same points in the plot from Jiao et al. above – provided you only look in the range between the lines, 17 < R < 23 kpc. This is where the kink at R = 17 kpc comes in. They appear to have truncated the data right where it needs to be truncated to ignore the point with a noticeably lower velocity, which would surely affect the determination of the slope and reduce its confidence level. They also exclude the point with a really big error bar that nominally is within their radial range. That’s OK, as it has little significance: it’s large error bar means it contributes little to the constraint. That is not the case for the datum just inside of R = 17 kpc, or the rest of the data at smaller radii for that matter. These have a manifestly shallower slope. Looking at the line boundaries added to Jiao’s plot, it appears that they selected the range of the data with the steepest gradient. This is called cherry-picking.

It is a strange form of cherry-picking, as there is no physical reason to expect a linear fit to be appropriate. A Keplerian downturn has velocity decline as the inverse square root of radius (see the dotted line above.) These data, over this limited range, may be consistent with a Keplerian downturn, but certainly do not establish that it is required.

Contrast the statements of Chan & Chung Law with the more measured statement from the paper where the data analysis is actually performed:

… a low mass for the Galaxy is driven by the functional forms tested, given that it probes beyond our measurements. It is found to be in tension with mass measurements from globular clusters, dwarf satellites, and streams.

Ou et al. (2023)

What this means is that the data do not go far enough out to measure the total mass. The low mass that is inferred from the data is a result of fitting some specific choice of halo form to it. They note that the result disagrees with other data, as I discussed last time.

Rather than cherry pick the data, we should look at all of it. Let’s see, I’ve done that before. We looked at the Wang et al. (2023) data via Jiao et al. previously, and just discussed the Ou et al. data. That leaves the new Zhao et al. data, so let’s look at those:

Milky Way rotation curve with RAR model (blue line from 2018) and the Gaia DR3 data as realized by Zhou et al. (2023: purple triangles). The dashed line shows the number of stars (right axis) informing each datum.

These data were the last of the current crop that I looked at. They look… pretty good in comparison with the pre-existing RAR model. Not exactly the falsification I had been led to expect.

So – the three different realizations of the Gaia DR3 data are largely consistent, yet one is being portrayed as a falsification of MOND while another is in good agreement with its prediction.

This is why you have to take astronomical error bars with a grain of salt. Three different groups are using data from the same source to obtain very nearly the same result. It isn’t quite the same result, as some of the data disagree at the formal limits of their uncertainty. No big deal – that’s what happens in astronomy. The number of stars per bin helps illustrate one reason why: we go from thousands of stars per bin near the sun to tens of stars in wider bins at R > 20 kpc. That’s not necessarily problematic, but it is emblematic of what we’re dealing with: great gobs of data up close, but only scarce scratches of it far away where systematic effects are more pernicious.

In the meantime, one realization of these data are being portrayed as a death knell for a theory that successfully predicts another realization of the same data. Well, which is it?


*Thanks to Moti Milgrom for pointing out the restricted range of radii considered by Chan & Chung Law and adding the vertical lines to this figure.

Recent Developments Concerning the Gravitational Potential of the Milky Way. I.

Recent Developments Concerning the Gravitational Potential of the Milky Way. I.

Recent results from the third data release (DR3) from Gaia has led to a flurry of papers. Some are good, some are great, some are neither of those. It is apparent from the comments last time that while I’ve kept my pledge to never dumb it down, I have perhaps been assuming more background knowledge on the part of readers than is adequate. I can’t cram a graduate education in astronomy into one web page, but will try to provide a little relevant context.

Galactic Astronomy is an ancient field, dating back at least to the Herschels. There is a lot that is known in the field. There have also been a lot of misleading observations, going back just as far to the Herschel’s map of the Milky Way, which was severely limited by extinction from interstellar dust. That’s easy to say now, but Herschel’s map was the standard for over a century – longer than our modern map has persisted.

So a lot has changed, including a lot that seemed certain, so I try to keep an open mind. The astronomers working with the Gaia data – the ones deriving the rotation curve – are simply following where those data take them, as they should. There are others using their analyses to less credible ends. A lot of context is required to distinguish the two.

The total mass of the Milky Way

There are a lot of constraints on the mass of the Milky Way that predate Gaia; it’s not like these are the first data that address the issue. Indeed, there are lots and lots and lots of other applicable data acquired using different methods over the course of many decades. Here is a summary plot of determinations of the mass of the Milky Way compiled by Wang et al. (2019).

This is an admirable compilation, and yet no such compilation can be complete. There are just so many determinations by lots of independent authors. Still, this is nice for listing multiple results from many distinct methodologies. They all consistently give numbers around 1012 solar masses. (Cast in these terms, my own estimate is 1.4 x 1012 albeit with a substantial systematic uncertainty.) I’ve added a point for the total mass according to the alleged Keplerian downturn seen in the Gaia data, 2 x 1011 solar masses. One of these things is not like the others.

The difference from the bulk of the data has nearly every astronomer rolling our collective eyes. Most of us straight up don’t believe it. That’s not to say the Gaia data are wrong, but the interpretation of those data as indicative of such a small, finite total mass seems unlikely in the light of all other results.

As I discussed briefly last time, it is conceivable that previous results are wrong or misleading due to some systematic effect or bad assumption. For example, mass estimates based on “satellite phenomenon” require the assumption that the satellite galaxies are indeed satellites of the Milky Way on bound orbits. That seems like a really good assumption, as without it, their presence is an instantaneous coincidence particular to the most recent few percent of a Hubble time: they wouldn’t have been nearby more than a billion years ago, and won’t be around another for even a few hundred million more. That sounds like a long time to you and me, but it is not that long on a cosmic scale. Maybe they’re raining down all the time to give the appearance of a steady state? Where have I heard that before?

Even if we’re willing to dismiss satellite constraints, that doesn’t suffice. It isn’t good enough to find flaw with one set of determinations; one must question all distinct methods. I could probably do that; there’s always a systematic uncertainty that might be bigger than expected or an assumption that could go badly wrong. But it is asking a lot for all of them to conspire to be wrong at the same time by the same amount. (The assumption of Newtonian gravity is a catch-all.)

Some constraints are more difficult to dodge than others. For example, the escape velocity method merely notes that there are fast moving stars in the solar neighborhood. Those stars are many billions of years old, and wouldn’t be here if the gravitational potential couldn’t contain them. The mass implied by the Gaia quasi-Keplerian downturn doesn’t suffice.

That said, the total mass of the Milky Way as expressed above is a rather notional quantity. M200 occurs roughly 200 kpc out for the Milky Way, give or take a lot. And the “200” in the subscript has nothing to do with that radius being 200 kpc for reasons too technical and silly to delve into. So my biggest concern about the compilation above is not that the data are wrong so much as they are being extrapolated to an idealized radius that we don’t directly observe. This extrapolation is usually done by assuming the potential of an NFW halo, which makes perfect sense in terms of LCDM but none whatsoever empirically, since NFW predicts the wrong density profile at small, intermediate, and large radii: where the density profile ρ ∝ r is predicted to have α = (1,2,3), it is persistently observed to be more like (0,1,2). While the latter profile is empirically more realistic, it also fails to converge to a finite total mass, rendering the concept meaningless.

Rather than indulge yet again in a discussion of the virtues and vices of different dark matter halo profiles, let’s look at an observationally more robust quantity: the enclosed mass. Wang et al. also provide a tabulation of this quantity from many sources, as depicted here:

Rotation curve constraints implied by the enclosed mass measurements tabulated by Wang et al. (2019) combined with the halo stars and globular clusters previously discussed. The location of the Large Magellanic Cloud is also indicated; data beyond this radius (and perhaps even within it) are subject to perturbation by the passage of the LMC. The RAR-based model is shown as the blue line; the light blue line includes a very uncertain estimate of the effect of the coronal gas. This is very diffuse and extended, and only becomes significant at very large radii. The dotted line is the Keplerian curve for a mass of 2 x 1011 M.

Not all of the enclosed mass data are consistent with one another. The bulk of them are consistent with the RAR model Milky Way (blue line). None of them are consistent with the small mass indicated by recent Gaia analyses (dotted line). Hence the collective unwillingness of most astronomers to accept the low-mass interpretation.

An important thing to note when considering data at large radii, especially those beyond 50 kpc, is that 50 kpc is the current Galactocentric radius of the Large Magellanic Cloud. The LMC brings with it its own dark matter halo, which perturbs the outer regions of the Milky Way. This effect is surprisingly strong*, and leads to the inference that the mass ratio of the two is only 4 or 5:1 even though the luminosity ratio is more like 20:1. This makes the interpretation of the data beyond 50 kpc problematic. If we use that as a pretext to ignore it, then we infer that our low mass Milky Way is no more massive then the LMC – an apparently absurd situation.

There are many rabbit holes we could dig down here, but the basic message is that a small Milky Way mass violates a gazillion well-established constraints. That doesn’t mean the Gaia data are wrong, but it does call into question their interpretation. So next time we’ll look more closely at the data.


*This is not surprising in MOND. The LMC is in the right place at the right time to cause the Galactic warp. The LMC as a candidate perturber to excite the Galactic warp was recognized early, but the conventional mass was thought to be much too small to do the job. The small baryonic mass of the LMC in MOND is not a problem as the long range nature of the force law makes tidal effects more pronounced: it works out about right.

Is the Milky Way’s rotation curve declining?

Is the Milky Way’s rotation curve declining?

Yes, some. That much is a step forward from a decade ago, when a common assumption was that the Milky Way’s rotation curve remained flat at the speed at which the sun orbited. This was a good guess based on empirical experience with other galaxies, but not all galaxies have rotation curves that are completely flat, nor can we be sure the sun is located where that is the case.

A bigger question whether the Milky Way’s rotation curve is declining in a Keplerian fashion. This would indicate that the total mass has been enclosed. That would be a remarkable result. If true, it would be the first time that the total mass of an individual galaxy has been measured. There have been claims to this effect before that have not panned out when the data have been extended to larger radii, so one might be inclined to be skeptical.

There are several claims now to see a distinctly declining rotation curve based on the third data release (DR3) from Gaia. The most recent, Jiao et al., has gained some note by virtue of putting “Keplerian decline” in the title, but very similar results have also been reported by Ou et al., Wang et al. and Sylos Labini et al. They all obtain basically the same answer using the same data, with minor differences in the error assessment and other details. There are also differences in interpretation*, which is always possible even when everyone agrees about what the data say.

Jiao et al. measure a total mass for the Milky Way of about 2 x 1011 M. Before looking at the data, let’s take a moment to think about that number. Most mass determinations – and there are lots, see Fig. 2 of Wang et al. – for the Milky Way have been in the neighborhood of 1012 M. Indeed, for most of my career, it was traditionally Known to be 2 x 1012 M. The new measurement is an order of magnitude smaller. That’s a lot to be off by, even in extragalactic astronomy. The difference, as we’ll see, has to do with what data we use.

The mass of stars and gas in the Milky Way is about 6 x 1010 M, give or take ten billion. That means that nearly a third of the total mass is normal baryonic matter that we can readily see. So the ratio of dark-to-baryonic mass is only 2.3:1, well short of the cosmic ratio of about 6:1. That’s embarrassing – especially since much of the effort in galaxy formation theory has been to explain why the baryon fraction is much less than the cosmic fraction, not much more. And here our Galaxy is an outlier, having much less dark matter for its stellar mass than everything else. It is always a bad sign when the Galaxy appears to violate the Copernican Principle.

Nonetheless, this is what we find if we look at the Gaia DR3 data. Here is a model I’ve shown before, extrapolated to larger radii with some new data added. The orange circles are the Gaia DR3 rotation curve as given by Jiao et al. For radii greater than 18 kpc, they show a clear decline consistent with a Keplerian curve for a 1.95 x 1011 M point mass (dotted line), as per Fig. 9 of Jiao et al.

Milky Way model (blue line) compared with various data.

This is the first time we’ve been able to trace the rotation curve so far out with stars in the disk of the Milky Way, and the Keplerian line is a good match. If that’s all we know, then a total mass of only 2 x 1011 M is a reasonable inference. That’s not all we know.

As I alluded above, a halo mass this small makes no sense in the context of cosmology. Not only is 2 x 1011 M too small, the more commonly inferred dynamical mass of 1012 M is also too small. According to abundance matching, which has become an important aspect of LCDM, the Milky Way should reside in a 3 or 4 x 1012 M halo. So the new mass makes a factor of 2 or 3 problem into a factor a ten problem. That is too large to attribute to scatter in the stellar mass-halo mass relation. Worse, there is no evidence that the Milky Way is an outlier from scaling relations like Tully-Fisher. We can’t have it one way and not the other.

The traditional mass estimates that obtain ~1012 M rely on dwarf satellite galaxies as tracers of the gravitational potential of the Milky Way. Maybe they’re not fair tracers? We have to make assumptions about their orbits to use them to infer a mass; perhaps these assumptions are wrong? It is conceivable that many of our satellites are on first infall rather than in well-established orbits. Indeed, the consensus is that our largest satellites, the Magellanic Clouds, are on first infall, and that they cause a substantial perturbation to the halo of the Milky Way. This was an absurd thought 15 years ago – the Magellanic clouds must have been here forever, and were far too small to do damage – but now this is standard lore.

There are tracers at large radii besides dwarf satellite galaxies. The figure above shows three: globular clusters (pink triangles) and two types of stars in the halo: blue horizontal branch stars (green squares) and K giants (red squares). These are well-known parts of the Milky Way that have been with us for many billions of years, so they’ve had plenty of time to become equilibrium tracers of the gravitational potential. They clearly indicate a larger enclosed mass than predicted by the Keplerian decline traced by the Gaia rotation curve, and are consistent with traditional satellite analyses. Perhaps these data are somehow misleading, but it is hard to see how.

Gaia is great, but has its limits. It is really optimized for nearby stars (within a few kpc). Outside of that, the statistics… leave something to be desired. Is it safe to push out beyond 20 kpc? I don’t know, but I did notice this panel from Fig. 8 of Wang et al.:

Radial velocities of stars at different heights above the Galactic plane.

The radial velocity is a minor component of disk motion, where azimuthal motion dominates. However, one does need to know it to solve the Jeans equation. Having it wrong will cause a perceptible systematic error. You notice the bifurcation in the data for R > 22 kpc? That, in technical terms, is Messed Up. I don’t know what goes awry there, but I’ve done this exercise enough times for the sight of this to scare the bejeepers out of me. No way I trust any of these data at R > 22 kpc, and I hope having seen this doesn’t give me nightmares tonight.

Perhaps the uncertainty caused by this is adequately reflected in the large error bars on the orange points above. Those with R > 22 kpc are nicely Keplerian, but also consistent with a lot of things, including the blue line that successfully predicts the halo stars and globular clusters. That’s not true for the data around R = 20 kpc where the error bars are much smaller: there the discrepancy with the blue line I take seriously. But that is a much more limited affair that might indicate the presence of a ring of mass – that’s what gives the bumps and wiggles at smaller radii – and certainly isn’t enough to imply the entire mass of the Milky Way has been enclosed.

But who knows? Perhaps fifteen years hence it will be the standard lore that all galaxies reside in dark matter halos that are only twice the mass of their luminous disks. At that mass ratio, all the galactic dark matter could be baryonic. I wouldn’t bet on it, but stranger things have happened before, and will happen again.


*A difference in interpretation is largely what the debate about dark matter and MOND boils down to. There is no doubt that there are acceleration discrepancies in extragalactic objects that require something beyond what you see is what you get with normal gravity. Whether we should blame what we can’t see or the assumption of normal gravity is open to interpretation. I would hope this is obvious, but this elementary point seems to be lost on many.