Galaxy Stellar and Halo Masses: tension between abundance matching and kinematics

Galaxy Stellar and Halo Masses: tension between abundance matching and kinematics

Mass is a basic quantity. How much stuff does an astronomical object contain? For a galaxy, mass can mean many different things: that of its stars, stellar remnants (e.g., white dwarfs, neutron stars), atomic gas, molecular clouds, plasma (ionized gas), dust, Bok globules, black holes, habitable planets, biomass, intelligent life, very small rocks… these are all very different numbers for the same galaxy, because galaxies contain lots of different things. Two things that many scientists have settled on as Very Important are a galaxy’s stellar mass and its dark matter halo mass.

The mass of a galaxy’s dark matter halo is not well known. Most measurement provide only lower limits, as tracers fade out before any clear end is reached. Consequently, the “total” mass is a rather notional quantity. So we’ve adopted as a convention the mass M200 contained within an over-density of 200 times the critical density of the universe. This is a choice motivated by an ex-theory that would take an entire post to explain unsatisfactorily, so do not question the convention: all choices are bad, so we stick with it.

One of the long-standing problems the cold dark matter paradigm has is that the galaxy luminosity function should be steep but is observed to be shallow. This sketch shows the basic issue. The number density of dark matter halos as a function of mass is expected to be a power law – one that is well specified once the cosmology is known and a convention for the mass is adopted. The obvious expectation is that the galaxy luminosity function should just be a downshifted version of the halo mass function: one galaxy per halo, with the stellar mass proportional to the halo mass. This was such an obvious assumption [being provision (i) of canonical galaxy formation in LCDM] that it was not seriously questioned for over a decade. (Minor point: a turn down at the high mass end could be attributed to gas cooling times: the universe didn’t have time to cool and assemble a galaxy above some threshold mass, but smaller things had plenty of time for gas to cool and form stars.)

The number density of galaxies (blue) and dark matter halos (red) as a function of their mass. Our original expectation is on the left: the galaxy mass function should be a down-shifted version of the halo mass function, up to a gas cooling limit. Dashed grey lines illustrate the correspondence of galaxies with dark matter halos of proportional mass: M* = md M200. On the right is the current picture of abundance matching with the grey lines connecting galaxies with dark matter halos of equal cosmic density in which they are supposed to reside. In effect, we make the proportionality factor md a rolling, mass-dependent fudge factor.

The galaxy luminosity function does not look like a shifted version of the halo mass function. It has the wrong slope at the faint end. At no point is the size of the shift equal to what one would expect from the mass of available baryons. The proportionality factor md is too small; this is sometimes called the over-cooling problem, in that a lot more baryons should have cooled to form stars than apparently did so. So, aside from the shape and the normalization, it’s a great match.

We obsessed about this problem all through the ’90s. At one point, I thought I had solved it. Low surface brightness galaxies were under-represented in galaxy surveys. They weren’t missed entirely, but their masses could be systematically underestimated. This might matter a lot because the associated volume corrections are huge. A small systematic in mass would get magnified into a big one in density. Sadly, after a brief period of optimism, it became clear that this could not work to solve the entire problem, which persists.

Circa 2000, a local version of the problem became known as the missing satellites problem. This is a down-shifted version of the mismatch between the galaxy luminosity function and the halo mass function that pervades the entire universe: few small galaxies are observed where many are predicted. To give visual life to the numbers we’re talking about, here is an image of the dark matter in a simulation of a Milky Way size galaxy:

Dark Matter in the Via Lactea simulation (Diemand et al. 2008). The central region is the main dark matter halo which would contain a large galaxy like the Milky Way. All the lesser blobs are subhalos. A typical galaxy-sized dark matter halo should contain many, many subhalos. Naively, we expect each subhalo to contain a dwarf satellite galaxy. Structure is scale-free in CDM, so major galaxies should look like miniature clusters of galaxies.

In contrast, real galaxies have rather fewer satellites that meet the eye:

NGC 6946 and environs. The points are foreground stars, ignore them. The neighborhood of NGC 6946 appears to be pretty empty – there is no swarm of satellite galaxies as in the simulation above. I know of two dwarf satellite galaxies in this image, both of low surface brightness. The brighter one (KK98-250) the sharp-eyed may find between the bright stars at top right. The fainter one (KK98-251) is nearby KK98-250, a bit down and to the left of it; good luck seeing it on this image from the Digital Sky Survey. That’s it. There are no other satellite galaxies visible here. There can of course be more that are too low in surface brightness to detect. The obvious assumption of a one-to-one relation between stellar and halo mass cannot be sustained; there must instead be a highly non-linear relation between mass and light so that subhalos contain only contain dwarfs of extraordinarily low surface brightness.

By 2010, we’d thrown in the towel, and decided to just accept that this aspect of the universe was too complicated to predict. The story now is that feedback changes the shape of the luminosity function at both the faint and the bright ends. Exactly how depends on who you ask, but the predicted halo mass function is sacrosanct so there must be physical processes that make it so. (This is an example of the Frenk Principle in action.)

Lacking a predictive theory, theorists instead came up with a clever trick to relate galaxies to their dark matter halos. This has come to be known as abundance matching. We measure the number density of galaxies as a function of stellar mass. We know, from theory, what the number density of dark matter halos should be as a function of halo mass. Then we match them up: galaxies of a given density live in halos of the corresponding density, as illustrated by the horizontal gray lines in the right panel of the figure above.

There have now been a number of efforts to quantify this. Four examples are given in the figure below (see this paper for references), together with kinematic mass estimates.

The ratio of stellar to halo mass as a function of dark matter halo mass. Lines represent the abundance matching relations derived by assigning galaxies to dark matter halos based on their cosmic abundance. Points are independent halo mass estimates based on kinematics (McGaugh et al. 2010). The horizontal dashed line represents the maximum stellar mass that would result if all available baryons were turned into stars. (Mathematically, this happens when md equals the cosmic baryon fraction, about 15%.)

The abundance matching relations have a peak around a halo mass of 1012 M and fall off to either side. This corresponds to the knee in the galaxy luminosity function. For whatever reason, halos of this mass seem to be most efficient at converting their available baryons into stars. The shape of these relations mean that there is a non-linear relation between stellar mass and halo mass. At the low mass end, a big range in stellar mass is compressed into a small range in halo mass. The opposite happens at high mass, where the most massive galaxies are generally presumed to be the “central” galaxy of a cluster of galaxies. We assign the most massive halos to big galaxies understanding that they may be surrounded by many subhalos, each containing a cluster galaxy.

Around the same time, I made a similar plot, but using kinematic measurements to estimate halo masses. Both methods are fraught with potential systematics, but they seem to agree reasonably well – at least over the range illustrated above. It gets dodgy above and below that. The agreement is particularly good for lower mass galaxies. There seems to be a departure for the most massive individual galaxies, but why worry about that when the glass is 3/4 full?

Skip ahead a decade, and some people think we’ve solved the missing satellite problem. One key ingredient of that solution is that the Milky Way resides in a halo that is on the lower end of the mass range that has traditionally been estimated for it (1 to 2 x 1012 M). This helps because the number of subhalos scales with mass: clusters are big halos with lots of galaxy-size halos; the Milky Way is a galaxy-sized halo with lots of smaller subhalos. Reality does not look like that, but having a lower mass means fewer subhalos, so that helps. It does not suffice. We must invoke feedback effects to make the relation between light and mass nonlinear. Then the lowest mass satellites may be too dim to detect: selection effects have to do a lot of work. It also helps to assume the distribution of satellites is isotropic, which looks to be true in the simulation, but not so much in reality where known dwarf satellites occupy a planar distribution. We also need to somehow fudge the too-big-to-fail problem, in which the more massive subhalos appear not to be occupied by luminous galaxies at all. Given all that, we can kinda sorta get in the right ballpark. Kinda, sorta, provided that we live in a galaxy whose halo mass is closer to 1012 M than to 2 x 1012 M.

At an IAU meeting in Shanghai (in July 2019, before travel restrictions), the subject of the mass of the Milky Way was discussed at length. It being our home galaxy, there are many ways in which to constrain the mass, some of which take advantage of tracers that go out to greater distances than we can obtain elsewhere. Speaker after speaker used different methods to come to a similar conclusion, with the consensus hedging on the low side (roughly 1 – 1.5 x 1012 M). A nice consequence would be that the missing satellite problem may no longer be a problem.

Galaxies in general and the Milky Way in particular are different and largely distinct subfields. Different data studied by different people with distinctive cultures. In the discussion at the end of the session, Pieter van Dokkum pointed out that from the perspective of other galaxies, the halo mass ought to follow from abundance matching, which for a galaxy like the Milky Way ought to be more like 3 x 1012 M, considerably more than anyone had suggested, but hard to exclude because most of that mass could be at distances beyond the reach of the available tracers.

This was not well received.

The session was followed by a coffee break, and I happened to find myself standing in line next to Pieter. I was still processing his comment, and decided he was right – from a certain point of view. So we got to talking about it, and wound up making the plot below, which appears in a short research note. (For those who know the field, it might be assumed that Pieter and I hate each other. This is not true, but we do frequently disagree, so the fact that we do agree about this is itself worthy of note.)

The Local Group and its two most massive galaxies, the Milky Way and Andromeda (M31), in the stellar mass-halo mass plane. Lines are the abundance matching relations from above. See McGaugh & van Dokkum for further details. The remaining galaxies of the Local Group all fall off the edge of this plot, and do not add up to anything close to either the Milky Way or Andromeda alone.

The Milky Way and Andromeda are the 1012 M gorillas of the Local Group. There are many dozens of dwarf galaxies, but none of them are comparable in mass, even with the boost provided by the non-linear relation between mass and luminosity. To astronomical accuracy, in terms of mass, the Milky Way plus Andromeda are the Local Group. There are many distinct constraints, on each galaxy as an individual, and on the Local Group as a whole. Any way we slice it, all three entities lie well off the relation expected from abundance matching.

There are several ways one could take it from here. One might suppose that abundance matching is correct, and we have underestimated the mass with other measurements. This happens all the time with rotation curves, which typically do not extend far enough out into the halo to give a good constraint on the total mass. This is hard to maintain for the Local Group, where we have lots of tracers in the form of dwarf satellites, and there are constraints on the motions of galaxies on still larger scales. Moreover, a high mass would be tragic for the missing satellite problem.

One might instead imagine that there is some scatter in the abundance matching relation, and we just happen to live in a galaxy that has a somewhat low mass for its luminosity. This is almost reasonable for the Milky Way, as there is some overlap between kinematic mass estimates and the expectations of abundance matching. But the missing satellite problem bites again unless we are pretty far off the central value of the abundance matching relation. Other Milky Way-like galaxies ought to fall on the other end of the spectrum, with more mass and more satellites. A lot of work is going on to look for satellites around other spirals, which is hard work (see NGC 6946 above). There is certainly scatter in the number of satellites from system to system, but whether this is theoretically sensible or enough to explain our Milky Way is not yet apparent.

There is a tendency in the literature to invoke scatter when and where needed. Here, it is important to bear in mind that there is little scatter in the Tully-Fisher relation. This is a relation between stellar mass and rotation velocity, with the latter supposedly set by the halo mass. We can’t have it both ways. Lots of scatter in the stellar mass-halo mass relation ought to cause a corresponding amount of scatter in Tully-Fisher. This is not observed. It is a much stronger than most people seem to appreciate, as even subtle effects are readily perceptible. Consequently, I think it unlikely that we can nuance the relation between halo mass and observed rotation speed to satisfy both relations without a lot of fine-tuning, which is usually a sign that something is wrong.

There are a lot of moving parts in modern galaxy formation simulations that need to be fine-tuned: the effects of halo mass, merging, dissipation, [non]adiabatic compression, angular momentum transport, gas cooling, on-going accretion of gas from the intergalactic medium, expulsion of gas in galactic winds, re-accretion of expelled gas via galactic fountains, star formation and the ensuing feedback from radiation pressure, stellar winds, supernovae, X-rays from stellar remnants, active galactic nuclei, and undoubtedly other effects I don’t recall off the top of my head. Visualization from the Dr. Seuss suite of simulations.

A lot of effort has been put into beating down the missing satellite problem around the Milky Way. Matters are worse for Andromeda. Kinematic halo mass estimates are typically in the same ballpark as the Milky Way. Some are a bit bigger, some are lower. Lower is a surprise, because the stellar mass of M31 is clearly bigger than that of the Milky Way, placing it is above the turnover where the efficiency of star formation is maximized. In this regime, a little stellar mass goes a long way in terms of halo mass. Abundance matching predicts that a galaxy of Andromeda’s stellar mass should reside in a dark matter halo of at least 1013 M. That’s quite a bit more than 1 or 2 x 1012 M, even by astronomical standards. Put another way, according to abundance matching, the Local Group should have the Milky Way as its most massive occupant. Just the Milky Way. Not the Milky Way plus Andromeda. Despite this, the Local Group is not anomalous among similar groups.

Words matter. A lot boils down to what we consider to be “close enough” to call similar. I do not consider the Milky Way and Andromeda to be all that similar. They are both giant spirals, yes, but galaxies are all individuals. Being composed of hundreds of billions of stars, give or take, leaves a lot of room for differences. In this case, the Milky Way and Andromeda are easily distinguished in the Tully-Fisher plane. Andromeda is about twice the baryonic mass of the Milky Way. It also rotates faster. The error bars on these quantities do not come close to overlapping – that would be one criterion for considering them to be similar – a criterion they do not meet. Even then, there could be other features that might be readily distinguished, but let’s say a rough equality in the Tully-Fisher plane would indicate stellar and halo masses that are “close enough” for our present discussion. They aren’t: to me, the Milky Way and M31 are clearly different galaxies.

I spent a fair amount of time reading the recent literature on satellites searches, and I was struck by the ubiquity with which people make the opposite assumption, treating the Milky Way and Andromeda as interchangeable galaxies of similar mass. Why would they do this? If one looks at the kinematic halo mass as the defining characteristic of a galaxy, they’re both close to 1012 M, with overlapping error bars on M200. By that standard, it seems fair. Is it?

Luminosity is observable. Rotation speed is observable. There are arguments to be had about how to convert luminosity into stellar mass, and what rotation speed measure is “best.” These are sometimes big arguments, but they are tiny in scale compared to estimating notional quantities like the halo mass. The mass M200 is not an observable quantity. As such, we have no business using it as a defining characteristic of a galaxy. You know a galaxy when you see it. The same cannot be said of a dark matter halo. Literally.

If, for some theoretically motivated reason, we want to use halo mass as a standard then we need to at least use a consistent method to assess its value from directly observable quantities. The methods we use for the Milky Way and M31 are not applicable beyond the Local Group. Nowhere else in the universe do we have such an intimate picture of the kinematic mass from a wide array of independent methods with tracers extending to such large radii. There are other standards we could apply, like the Tully-Fisher relation. That we can do outside the Local Group, but by that standard we would not infer that M31 and the Milky Way are the same. Other observables we can fairly apply to other galaxies are their luminosities (stellar masses) and cosmic number densities (abundance matching). From that perspective, what we know from all the other galaxies in the universe is that the factor of ~2 difference in stellar mass between Andromeda and the Milky Way should be huge in terms of halo mass. If it were anywhere else in the universe, we wouldn’t treat these two galaxies as interchangeably equal. This is the essence of Pieter’s insight: abundance matching is all about the abundance of dark matter halos, so that would seem to be the appropriate metric by which to predict the expected number of satellites, not the kinematic halo mass that we can’t measure in the same way anywhere else in the universe.

That isn’t to say we don’t have some handle on kinematic halo masses, it’s just that most of that information comes from rotation curves that don’t typically extend as far as the tracers that we have in the Local Group. Some rotation curves are more extended than others, so one has to account for that variation. Typically, we can only put a lower limit on the halo mass, but if we assume a profile like NFW – the standard thing to do in LCDM, then we can sometimes exclude halos that are too massive.

Abundance matching has become important enough to LCDM that we included it as a prior in fitting dark matter halo models to rotation curves. For example:

The stellar mass-halo mass relation from rotation curve fits (Li et al 2020). Each point is one galaxy; the expected abundance matching relation (line) is not recovered (left) unless it is imposed as a prior (right). The data are generally OK with this because the amount of mass at radii beyond the end of the rotation curve is not strongly constrained. Still, there are some limits on how crazy this can get.

NFW halos are self-similar: low mass halos look very much like high mass halos over the range that is constrained by data. Consequently, if you have some idea what the total mass of the halo should be, as abundance matching provides, and you impose that as a prior, the fits for most galaxies say “OK.” The data covering the visible galaxy have little power to constrain what is going on with the dark matter halo at much larger radii, so the fits literally fall into line when told to do so, as seen in Pengfei‘s work.

That we can impose abundance matching as a prior does not necessarily mean the result is reasonable. The highest halo masses that abundance matching wants in the plot above are crazy talk from a kinematic perspective. I didn’t put too much stock in this, as the NFW halo itself, the go-to standard of LCDM, provides the worst description of the data among all the dozen or so halo models that we considered. Still, we did notice that even with abundance matching imposed as a prior, there are a lot more points above the line than below it at the high mass end (above the bend in the figure above). The rotation curves are sometimes pushing back against the imposed prior; they often don’t want such a high halo mass. This was explored in some detail by Posti et al., who found a similar effect.

I decided to turn the question around. Can we use abundance matching to predict the halo and hence rotation curve of a massive galaxy? The largest spiral in the local universe, UGC 2885, has one of the most extended rotation curves known, meaning that it does provide some constraint on the halo mass. This galaxy has been known as an important case since Vera Rubin’s work in the ’70s. With a modern distance scale, its rotation curve extends out 80 kpc. That’s over a quarter million light-years – a damn long way, even by the standards of galaxies. It also rotates remarkably fast, just shy of 300 km/s. It is big and massive.

(As an aside, Vera once offered a prize for anyone who found a disk that rotated faster than 300 km/s. Throughout her years of looking at hundreds of galaxies, UGC 2885 remained the record holder, with 300 seeming to be a threshold that spirals did not exceed. She told me that she did pay out, but on a technicality: someone showed her a gas disk around a supermassive black hole in Keplerian rotation that went up to 500 km/s at its peak. She lamented that she had been imprecise in her language, as that was nothing like what she meant, which was the flat rotation speed of a spiral galaxy.)

That aside aside, if we take abundance matching at face value, then the stellar mass of a galaxy predicts the mass of its dark matter halo. Using the most conservative (in that it returns the lowest halo mass) of the various abundance matching relations indicates that with a stellar mass of about 2 x 1011 M, UGC 2885 should have a halo mass of 3 x 1013 M. Combining this with a well-known relation between halo concentration and mass for NFW halos, we then know what the rotation curve should be. Doing this for UGC 2885 yields a tragic result:

The extended rotation curve of UGC 2885 (points). The declining dotted line is the rotation curve predicted by the observed stars and gas. The rising dashed line is the halo predicted by abundance matching. Combining this halo with the observed stars and gas should result in the solid line. This greatly exceeds the data. UGC 2885 does not reside in an NFW halo that is anywhere near as massive as predicted by abundance matching.

The data do not allow for the predicted amount of dark matter. If we fit the rotation curve, we obtain a “mere” M200 = 5 x 1012 M. Note that this means that UGC 2885 is basically the Milky Way and Andromeda added together in terms of both stellar mass and halo mass – if added to the M*-M200 plot above, it would land very close to the open circle representing the more massive halo estimate for the combination of MW+M31, and be just as discrepant from the abundance matching relations. We get the same result regardless of which direction we look at it from.

Objectively, 5 x 1012 M is a huge dark matter halo for a single galaxy. It’s just not the yet-more massive halo that is predicted by abundance matching. In this context, UGC 2885 apparently has a serious missing satellites problem, as it does not appear to be swimming in a sea of satellite galaxies the way we’d expect for the central galaxy of such high mass halo.

UGC 2885 appears to be pretty lonely in this image from the DSS. I see a few candidate satellite galaxies amidst the numerous foreground stars, but nothing like what you’d expect for dark matter subhalos from a simulation like the via Lactea. This impression does not change when imaged in more detail with HST.

It is tempting to write this off as a curious anecdote. Another outlier. Sure, that’s always possible, but this is more than a bit ridiculous. Anyone who wants to go this route I refer to Snoop Dog.

I spent much of my early career obsessed with selection effects. These preclude us from seeing low surface brightness galaxies as readily as brighter ones. However, it isn’t binary – a galaxy has to be extraordinarily low surface brightness before it becomes effectively invisible. The selection effect is a bias – and a very strong one – but not an absolute screen that prevents us from finding low surface brightness galaxies. That makes it very hard to sustain the popular notion that there are lots of subhalos that simply contain ultradiffuse galaxies that cannot currently be seen. I’ve been down this road many times as an optimist in favor of this interpretation. It hasn’t worked out. Selection effects are huge, but still nowhere near big enough to overcome the required deficit.

Having the satellite galaxies that inhabit subhalos be low in surface brightness is a necessary but not sufficient criterion. It is also necessary to have a highly non-linear stellar mass-halo mass relation at low mass. In effect, luminosity and halo mass become decoupled: satellite galaxies spanning a vast range in luminosity must live in dark matter halos that cover only a tiny range. This means that it should not be possible to predict stellar motions in these galaxies from their luminosity. The relation between mass and light has just become too weak and messy.

And yet, we can do exactly that. Over and over again. This simply should not be possible in LCDM.

The Fat One – a test of structure formation with the most massive cluster of galaxies

The Fat One – a test of structure formation with the most massive cluster of galaxies

A common objection to MOND is that it does not entirely reconcile the mass discrepancy in clusters of galaxies. This can be seen as an offset in the acceleration scale between individual galaxies and clusters. This is widely seen as definitive proof of dark matter, but this is just defaulting to our confirmation bias without checking if it is really any better: just because MOND does something wrong doesn’t automatically mean that LCDM does it right.

The characteristic acceleration (in units of Milgrom’s constant a0) of extragalactic objects as a function of their baryonic mass, ranging from tiny dwarf galaxies to giant clusters of galaxies. Clusters are offset from individual galaxies, implying a residual missing mass problem for MOND. From Famaey & McGaugh (2012).

I do see clusters as a problem for MOND, and there are some aspects of clusters that make good sense in LCDM. Unlike galaxies, cluster mass profiles are generally consistent with the predicted NFW halos (modulo their own core problem). That’s not a contradiction to MOND, which should do the same thing as Newton in the Newtonian regime. But rich clusters also have baryon fractions close to that expected from cosmology. From that perspective, it looks pretty reasonable. This success does not extend to lower mass clusters; in the plot above, the low mass green triangles should be higher than the higher mass gray triangles in order for all clusters to have the cosmic baryon fraction. They should not parallel the prediction of MOND. Within individual clusters, baryons are not as well mixed with dark matter as expected: they tend to have too much unseen mass at small radius, which is basically the same problem encountered by MOND.

There are other tests, one of which is the growth of clusters. Structure is predicted to form hierarchically in LCDM: small objects form first, and pile on to make bigger ones, with the largest clusters being the last to form. So there is a test in how massive a cluster can get as a function of redshift. This is something for which LCDM makes a clear prediction. In MOND, my expectation is that structure forms faster so that massive objects are in place at higher redshift than expected in LCDM. This post is mostly about clusters in LCDM, so henceforth all masses will be conventional masses, including the putative dark matter.

Like so many things, there is a long history to this. For example, in the late ’90s, Megan Donahue reported a high temperature of ~ 12 keV for the intracluster gas in the cluster MS1054-0321. This meant that it was massive for its redshift: 7.4 x 1014 h-1 M (dark matter and all) at z = 0.829, when the universe was only about half its current age. (Little h is the Hubble constant in units of 100 km/s/Mpc. Since we’re now pretty sure h < 1, the true mass is higher, more like 1015 M.) That’s a lot of solar masses to assemble in the available time. In 1997, this was another nail in the coffin of SCDM, which was already a zombie theory by then. But the loss of Ωm = 1 was still raw for some people, I guess, because she got a lot of grief for it. Can’t be true! Clusters don’t get that big that early! At least they shouldn’t. In SCDM.

Structure formation in SCDM was elegant in that in continues perpetually: as the universe expands, bigger and bigger structures continue to form; statistically, later epochs look like scaled-up versions of earlier epochs. In LCDM, this symmetry is broken by the decline in density as the universe expands. Consequently, structure forms earlier in LCDM: the action has to happen when there is still some density to work with, and the accelerated expansion provides some extra time (what’s a few billion years among cosmologists?) for mass to get together. Consequently, MS1054-0321 is not problematic in LCDM.

The attitude persisted, however. In the mid-’00s, Jim Schombert and I started using the wide field near-IR camera NEWFIRM to study high redshift clusters. Jim had a clever way of identifying them, which turned out not to be particularly hard, e.g., MS 1426.9+1052 at z = 1.83. This is about 10 Gyr ago, and made the theorists squirm. That didn’t leave enough time for a cluster to form. On multiple occasions I had the following conversation with different theorists:

me: Hey, look at this clusters at z = 1.8.

theorist: That isn’t a cluster.

me: Sure it is. There’s the central galaxy, which contains a bright radio source (QSO). You can see lots of other galaxies around it. That’s what a cluster looks like.

theorist: Must be a chance projection.

me: There are spectra for many of the surrounding galaxies; they’re all at the same redshift.

theorist: …

me: So… a cluster at z = 1.8. Pretty cool, huh?

theorist: That isn’t a cluster.

This work became part of Jay Frank’s thesis. He found evidence for more structure at even higher redshift. A lot of this apparent clustering probably is not real… the statistics get worse as you push farther out: fewer galaxies, worse data. But there were still a surprising number of objects in apparent association up to and beyond z > 5. That’s pretty much all of time, leaving a mere Gyr to go from the completely homogeneous universe that we see in the CMB at z = 1090 to the first stars around z ~ 20 to the first galaxies to big galaxies to protoclusters – or whatever we want to call these associations of many galaxies in the same place on the sky at the same redshift.

Jay did a lot of work to estimate the rate of false positives. Long story short, we expect about 1/3 of the protoclusters he identified to be real structures. That’s both bad and good – lots of chaff, but some wheat too. One thing Jay did was to analyze the Millennium simulation in the same way as the data. This allows us to quantify what we would see if the universe looked like an LCDM simulation.

The plot below shows the characteristic brightness of galaxies at various redshifts. For the pros, this is the knee in the Schechter function fit to the luminosity distribution of galaxies in redshift bins. We saw the same thing in protoclusters and in the field: galaxies were brighter than anticipated in the simulation. Between redshifts 3 < z < 4, the characteristic magnitude is expected to be 23. That’s pretty faint. In the data, it’s more like 21. That’s also faint, but about a factor of 6 brighter than they should be. That’s a lot of stars that have formed before they’re supposed to, in galaxies that are bigger than they should yet be, with some of them already clustering together ahead of their time.

The characteristic magnitude of galaxies in the Spitzer 4.5 micron band as a function of redshift in the Millennium simulation (black squares) and in reality (circles). This is a classic backwards astronomical plot in which larger magnitudes are fainter sources. At high redshift, simulations predict that galaxies should not yet have grown to become as bright as they are observed to be. From Franck (2017).

This has been the observer’s experience. Donahue wasn’t the first, and Franck won’t be the last. Every time we look, we see more structure in place sooner than had been expected before it was seen. I don’t hear people complaining about our clusters at z = 1.8 anymore; those have been seen enough to become normalized. Perhaps they have even been explained satisfactorily. But they sure weren’t expected, much less predicted.

So, just how big can a cluster get? Mortonson et al. (2011) set out to answer this question. The graph below shows the upper limit they predict for the most massive cluster in the universe as a function of redshift. This declines as redshift increases because we’re looking back in time; high redshift clusters haven’t had time time to assemble more mass than the upper most line. They project this into what would be discovered in an all-sky survey, and more realistic surveys of finite size. Basically nothing should exist above these lines.

The predicted maximum mass of galaxy clusters as a function of redshift from Mortonson et al. (2011). Each line is the predicted upper limit for the corresponding amount of sky surveyed. The green line illustrates the area of the sky in which El Gordo was discovered. The points show independent mass estimates for El Gordo from Menanteau et al. (2012) and Jee et al. (2014). These are significantly above the predicted upper limit.

Their prediction was almost immediately put to the test by the discovery of El Gordo, a big fat cluster at z = 0.87 reported by Menanteau et al. (2012), who published the X-ray image above. It is currently the record holder for the most massive known object that is thought to be gravitationally bound, weighing in at 2 or 3 x 1015 M, depending on who you ask. That’s about a thousand Milky Ways, plus a few hundred Andromedas. Give or take.

El Gordo straddles the uppermost line in the graph above. A naive reading of the first mass estimate suggests that it’s roughly a 50/50 proposition whether the entire observable universe should contain exactly one El Gordo. However, El Gordo was discovered in something less than a full sky survey. The appropriate comparison is to the green line, which El Gordo clearly exceeds – by about 3 sigma. This is the case for both of the illustrated mass estimates as the high mass point has a larger error bar. They both exceed the green line by a hair less than 3 sigma. Formally, this means that the chance of finding El Gordo in our universe is only a few percent.

A few percent is not good. Neither is it terrible – I’ve often commented here on how the uncertainties are larger than they seem. This is especially true of the tails of the distribution. So maybe a few percent is pessimistic; sometimes that’s how the dice roll. On the other hand, the odds aren’t better than 10%: El Gordo is not likely to exist however we slice the uncertainties. Whether we should be worried about it is just a matter of how surprising it is. A similar situation arises with the collision velocity of the Bullet cluster, which is either absurdly unlikely (about 1 chance in 10 billion) or merely unusual (maybe 1 in 10). So I made the above plot by adding El Gordo to the predictions of Mortonson et al., and filed it away under


Recently, Elena Asencio, Indranil Banik, and Pavel Kroupa have made a more thorough study. They have their own blog post, so I won’t repeat the technical description. Basically, they sift through a really big LCDM simulation to find objects that could be (or become) like El Gordo.

The short answer is that it doesn’t happen, similar to big voids. They estimate that the odds of El Gordo existing are a bit less than one in a billion. I’m sure one can quibble with details, but we’re not going to save LCDM with factors of two in a probability that starts this low. El Gordo just shouldn’t exist.

The probability is lower than in the graph above because it isn’t just a matter of mass. It is also the mass ratio of the merging clumps (both huge clusters in their own right), their collision speed, impact parameter, and morphology. As they are aware, one must be careful not to demand a perfect match, since there is only one reality. But neither is it just a matter of assembling mass; that understates the severity of the problem. This is where simulations are genuinely helpful: one can ask how often does this happen? If the answer is never, one can refine the query to be more lenient. The bottom line here is that you can’t be lenient enough to get something like El Gordo.

Here is their money plot. To be like El Gordo, an object would have to be up on the red line. That’s well above 5 sigma, which is the threshold where we traditionally stop quibbling about percentiles and just say Nope. Not an accident.

Logarithmic mass as a function of expansion factor [how big the universe is. This is inversely related to redshift: a = 1/(1+z)]. The color scale gives the number density of objects of a given mass as a function of how far the universe has expanded. The solid lines show the corresponding odds (in sigma) of finding such a thing in a large LCDM simulation. Figure from Asencio et al (2020).

In principle, this one object falsifies the LCDM structure formation paradigm. We are reluctant to put too much emphasis on a single object (unless it is the bullet cluster and we have clickbait to sell) as its a big universe, so there can always be one unicorn or magnetic monopole somewhere. Ascencio et al note that a similar constraint follows for the Bullet cluster itself, which also should not exist, albeit at a lower significance. That’s two unicorns: we can’t pretend that this is a one-off occurrence. The joint probability of living in a universe with both El Gordo and the Bullet cluster is even lower than either alone.

Looking at Ascencio’s figure, it strikes me as odd not only that we find huge things at high redshift, but also that we don’t see still bigger objects at low redshift. There were already these huge clusters ramming into each other when the universe had only expanded to half its present size. This process should continue to build still bigger clusters, as indicated by the lines in the plot. The sweet spot for finding really massive clusters should be about z = 0.5, by which time they could have reached a mass of nearly 1016 M as readily (or not!) as El Gordo could reach its mass by its observed redshift. (The lines turn down for the largest expansion factors/lowest redshifts because surveys cover a fixed area on the sky, which is a conical volume in 3D. We reside at the point of the cone, and need to see a ways out before a volume large enough to contain a giant cluster has been covered.)

I have never heard a report of a cluster anywhere near to 1016 M. A big cluster is 1015 M. While multiple examples of clusters this big are known, to the best of my knowledge, El Gordo is the record holding Fat One at twice or thrice that. The nearest challenger I can readily find is RX J1347.5-1145 at z=0.451 (close to the survey sweet spot) weighing in at 2 x 1015 M. Clusters just don’t seem to get bigger than that. This mass is OK at low redshift, but at higher z we shouldn’t see things as big as El Gordo. Given that we do see them at z = 0.87 (a = 0.535), why don’t we see still bigger ones at lower redshift? Perhaps structure formation saturates, but that’s not what LCDM predicts. If we can somehow explain El Gordo at high z, we are implicitly predicting still bigger clusters at lower redshift – objects we have yet to discover, if they exist, which they shouldn’t.

Which is the point.


The image featured at top is an X-ray image of the hot gas in the intracluster medium of El Gordo from NASA/CXC/Rutgers/J. Hughes et al.