What is empirical?

What is empirical?

I find that my scientific colleagues have a variety of attitudes about what counts as a theory. Some of the differences amount to different standards. Others are simply misconceptions about specific theories. This comes up a lot in discussions of MOND. Before we go there, we need to establish some essentials.

What is empirical?

I consider myself to be a very empirically-minded scientist. To me, what is empirical is what the data show.

Hmm. What are data? The results of experiments or observations of the natural world. To give a relevant example, here are some long slit observations of a low surface brightness galaxies (from McGaugh, Rubin, & de Blok 2001).


What you see are spectra obtained with the Kitt Peak 4m telescope. Spectra run from blue to red from left to right while the vertical axis is position along the slit. Vertical bars are emission lines in the night sky. The horizontal grey stuff is the continuum emission from stars in each galaxy (typically very weak in low surface brightness galaxies). You also see blobby dark bars running more or less up and down, but with an S-shaped bend. These are emission lines of hydrogen (the brightest one), nitrogen (on either side of hydrogen) and sulfur [towards the right edge of (a) and (d)].

I chose this example because you can see the rotation curves of these galaxies by eye directly in the raw data. Night sky lines provide an effective wavelength calibration; you can see one side of each galaxy is redshifted by a greater amount than the other: one side is approaching, the other receding relative to the Hubble flow velocity of the center of each galaxy. With little effort, you can also see the flat part of each rotation curve (Vf) and the entire shape V(R)*sin(i) [these are raw data, not corrected for inclination. You can even see a common hazard to real world data in the cosmic ray that struck near then end of the Hα line in (f)].

Data like these lead to rotation curves like these (from the same paper):


These rotation curves were measured by Vera Rubin. Vera loved to measure. She liked nothing better than to delve into the data to see what they said. She was very good at it.

Some of these data are good. Some are not. Some galaxies are symmetric (filled and open symbols represent approaching and receding sides), others are not. This is what the real world of galaxy data looks like. With practice, one develops a good intuition for what data are trustworthy and which are not.

To get from the data to these rotation curves, we correct for known effects: the expansion of the universe (which stretches the redshifts by 1+z), the inclination of each galaxy (estimated in this case by the axis ratios of the images), and most importantly, assuming the Doppler effect holds. That is, we make use of the well known relation between wavelength and speed to turn the measured wavelengths of the Hα and other emission lines into velocities. We use the distance to each galaxy (estimated from the Hubble Law) to convert measured position along the slit into physical radius.

This is empirical. Empirical results are as close to the data as possible. Here we make use of two other empirical results, the Doppler effect and the Hubble Law. So there is more to it than just the raw (or even processed) data; we are also making use of previously established facts.

An example of a new empirical fact obtained from data like these is the Baryonic Tully-Fisher relation. This is a plot of the observed baryonic mass in a galaxy (the sum of stars and gas) against the amplitude of flat rotation (Vf).


Here one must utilize some other information to estimate the mass-to-light ratio of the stars. This is an additional step; how we go about it affects the details of the result but not the basic empirical fact that the relation exists.

It is important distinguish between the empirical relation as plotted above, and a function that might be fit to it. The above data are well fit by the formula

Mb = 50 Vf4

with mass measured in solar masses and rotation speed in kilometers per second. The fit is merely a convenient representation of the data. The data themselves are the empirical result.

In this case, the scatter in the data is consistent with what you’d expect for the size of the error bars. The observed relation is consistent with one that has zero intrinsic scatter – a true line. The reason for that might be that it is imposed by some underlying theory (e.g., MOND). Whether MOND is the reason for the Baryonic Tully-Fisher relation is something that can be debated. That the relation exists as an empirical result that must be explained by any theory that attempts to explain the mass discrepancy problem in galaxies cannot.

I would hope it is obvious that theory should explain data. In the context of the mass discrepancy problem, the Baryonic Tully-Fisher relation is one fact that needs to be explained. There are many others. Which interpretation we are driven towards depends a great deal on how we weigh the facts. How important is this particular fact in the context of all others?

I find that many scientists confuse the empirical Baryonic Tully-Fisher relation with the theory MOND. Yes, MOND predicts a Baryonic Tully-Fisher relation, but they are not the same thing. The one we observe is consistent with MOND. It need not be (and is not for some implausible but justifiable assumptions about the stellar mass-to-light ratio). The mistake I see many people make – often reputable, intelligent scientists – is to conflate data with theory. A common line of reasoning seems to be “These data support MOND. But we know MOND is wrong for other reasons. Therefore these data are wrong.” This is a logical fallacy.

More generally, it is OK to incorporate well established results (like the Doppler effect) into new empirical results so long as we are careful to keep track of the entire line of reasoning and are willing to re-examine all the assumptions. In the example of the Baryonic Tully-Fisher relation, the critical assumption is the mass-to-light ratio of the stars. That has minor effects: mostly it just tweaks the slope of the line you fit to the data.

If instead we have reason to doubt the applicability of something deeper, like the applicability of the Doppler formula to galaxies, that would open a giant can of worms: a lot more than the Tully-Fisher relation would be wrong. For this reason, scientists are usually very impatient with challenges to well established results (who hasn’t received email asserting “Einstein was wrong!”?) To many, MOND seems to be in this category.

Consequently, many scientists are quick to dismiss MOND without serious thought. I did, initially. But eventually it had enough predictions come true that I felt compelled to check. (Bekenstein pointed out a long time ago that MOND has had many more predictions come true than General Relativity had had at the time of its widespread acceptance.) When I checked, I found that the incorrect assumption I had made was that MOND could so lightly be dismissed. In my experience since then, most of the people arguing against MOND haven’t bothered to check their facts (surely it can’t be true!), or have chosen to selectively weigh most those that agree with their preconception of what the result should be. If the first thing someone mentions in this context is the Bullet cluster, they are probably guilty of both these things, as this has become the go-to excuse not to have to think too hard about the subject. Cognitive dissonance is rife.


Structure Formation Mythology

Do not be too proud of this technological terror you’ve constructed. The ability to simulate the formation of large scale structure is insignificant next to the power of the Force.

– Darth Vader, Lord of the Sith

The now standard cosmology, ΛCDM, has a well developed cosmogony that provides a satisfactory explanation of the formation of large scale structure in the universe. It provides a good fit to both the galaxy power spectrum at low redshift and that of the cosmic microwave background (CMB) at z=1080. This has led to a common misconception among cosmologists that this is only way it can be.

The problem is this: the early universe was essentially homogeneous, while the current universe is not. At the time of recombination, one patch of plasma had the same temperature and density as the next patch to 1 part in 100,000. Look around at the universe now, and you see something very different: galaxies strung along a vast web characterized chiefly by empty space and enormous voids. Trouble is, you can’t get here from there.

Gravity will form structure, making the over-dense patches grow ever denser, in a classic case of the rich getting richer. But gravity is extraordinarily weak. There simply is not enough time in the ~13 Gyr age of the universe for it to make the tiny density variation observed in the CMB into the rich amount of structure observed today.

We need something to goose the process. This is where non-baryonic cold dark matter (CDM) comes in. It outweighs the normal matter, and does not interact with the photons of the CMB. This latter part is critical, as the baryons are strongly coupled to the photons, which don’t let them clump up enough early on. The CDM can. So it starts to form structure early which the baryons subsequently trace. Since structure formed, CDM must exist.

This is a sound line of reasoning. It convinced many of us, including myself, that there had to be some form of non-baryonic mass made of some particle outside the standard model of particle physics. The other key fact was that the gravitating mass density was inferred to outweigh the amount of baryons indicated by Big Bang Nucleosynthesis (Ωm ≫ Ωb).

Does anyone spot the problem with this line of thinking?

It took me a long time to realize what it was. Both the structure formation argument and the apparent fact that Ωm ≫ Ωb implicitly assume that gravity is normal. All we need to know to approach either problem is what Newton and Einstein taught us. Once we make that assumption, we are absolutely locked into the line of reasoning that leads us to CDM.

I worry that CDM is a modern æther. Given our present understanding of physics, it has to exist. In the nineteenth century, so too did æther. Had to. Only problem was, it didn’t.

If, for a moment, we let go of our implicit assumption, then we may realize that what structure formation needs is an extra push (or pull, to make overdensities collapse faster). That extra push may come from CDM, or it may come from an increase in the strength of the effective force law. Rather than being absolute proof of the existence of CDM, the rapid formation of structure might also be another indication that we need to tweak for force law.

I have previously outlined how structure might form in a modified force law like MOND. Early efforts do not provide as good a fit to the power spectrum as ΛCDM. But they provide a much better approximation than did the predecessor of ΛCDM, SCDM.

Indeed, there have been some striking predictive successes. As we probe to ever higher redshift, we see time and again more structure than had been anticipated by ΛCDM. Galaxies form early in MOND, so this is quite natural. So too does the cosmic web, which I predict to be more developed in MOND at redshifts of 3 and even 5. By low redshift, MOND does a much better job of emptying out the voids than does ΛCDM. Ultimately, I expect we may get a test from 21 cm reverberation mapping in the dark ages, where I predict we may find evidence of strong baryonic oscillations. (These predictions were made, and published in refereed journals, in the previous millennium.)

I would not claim that MOND provides a satisfactory description of large scale structure. The subject requires a lot more work.  Structure formation in MOND is highly non-linear. It is a tougher problem than standard perturbation theory. Yet we have lavished tens of thousands of person-years of effort on ΛCDM, and virtually no effort on the harder problem in the case of MOND. Having failed to make an effort does not suffice as evidence.

Plate of Shrimp

I should perhaps explain a little about the title of the last post. It is perfectly obvious to me. But probably not to anyone else.

Our brains work in subtly different ways. One thing that mine does, whether I like it or not, is memorize lines and make obscure links between them. It is a facility I share with a few other people. I have had entire conversations by analogy through quotes with people who share this facility, much to the annoyance of those who don’t think this way and find it disturbingly freakish.

For example, Cole Miller and I occasionally send challenges to each other: lists of quotes we have to place in context. We’re both pretty good at it. As nerds of a certain age, there is a great deal of cultural overlap between us: we know the same quotes. Still, we are less likely to miss a quote than we are to discover ones that reveal small differences in our cultural knowledge.

In the movie Repo Man, there is a scene in which the goofiest character (of many eccentrics) goes on a tear about cosmic coincidences, giving an odd example: “Suppose you’re thinkin’ about a plate o’ shrimp. Suddenly someone’ll say, like, plate, or shrimp, or plate o’ shrimp out of the blue, no explanation.” How is this relevant? you might reasonably ask. Well, it has to do with title generation.

As it happens, as I was on my way to New York, this story about the infamous Saturday Night Live performance of the punk band Fear came to my attention. In it, they perform the song New York’s Alright If You Like Saxophones. So this was in my head as I traveled to New York. Fear contributed Let’s Have a War to the Repo Man soundtrack. And I do like saxophones – or at least, I used to play one.

Plate of shrimp.


Dark Matter halo fits – today’s cut

I said I would occasionally talk about scientific papers. Today’s post is about the new paper Testing Feedback-Modified Dark Matter Haloes with Galaxy Rotation Curves: Estimation of Halo Parameters and Consistency with ΛCDM by Harley Katz et al.

I’ve spent a fair portion of my career fitting dark matter halos to rotation curves, and trying to make sense of the results.  It is a tricky business plagued on the one hand by degeneracies in the fitting (there is often room to trade off between stellar and dark mass) and on the other by a world of confirmation bias (many of us would really like to get the “right” answer – the NFW halo that emerges from numerical structure formation simulations).

No doubt these issues will come up again. For now, I’d just like to say what a great job Harley did. The MCMC has become the gold standard for parameter estimation, but it is no silver bullet to be applied naively. Harley avoided this trap and did a masterful job with the statistics.

The basic result is that primordial (NFW) halos do not fit the data as well as those modified by baryonic processes (we specifically fit the DC14 halo model). On the one hand, this is not surprising – it has been clear for many years that NFW doesn’t provide a satisfactory description of the data. On the other hand, it was not clear that feedback models would provide something better.

What is new is that fits of the DC14 halo profile to rotation curve data not only fit better than NFW (in terms of χ2), they also return the stellar mass-halo mass relation expected from abundance matching and are also consistent with the predicted concentration-halo mass relation.


The stellar mass-halo mass relation (top) and concentration-halo mass relation (bottom) for NFW (left) and DC14 (right) halos. The data are from fits to rotation curves in the SPARC database, which provides homogeneous near-IR mass models for ~150 galaxies. The grey bands are the expectation from abundance matching (top) and simulations (bottom).

The relations shown in grey in  the figure have to be true in ΛCDM. Indeed, SCDM had predicted much higher concentrations – this was one of the many reasons for finally rejecting it. The non-linear relation between stellar mass and halo mass was not expected, but is imposed on us by the mismatch between the steep predicted halo mass function and the flat observed luminosity function. (This is related to the missing satellite problem – a misnomer, since it is true everywhere in the field.)

It is not at all obvious that fitting rotation curves would return the same relation found in abundance matching. With NFW halos, it does not. Many galaxies fall off the relation if we force fits with this profile. (Note also the many galaxies pegged to the lower right edge of the concentration-mass panel at lower left. This is the usual cusp-core problem.)

In contrast, the vast majority of galaxies are in agreement with the stellar mass-halo mass relation when we fit the DC14 halo. The data are also broadly consistent with the concentration-halo mass relation. This happens without imposing strong priors: it just falls out. Dark matter halos with cores have long been considered anathema to ΛCDM, but now they appear essential to it.

And then there were six

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.

– John von Neumann

The simple and elegant cosmology encapsulated by the search for two numbers has been replaced by ΛCDM. This is neither simple nor elegant. In addition to the Hubble constant and density parameter, we now also require distinct density parameters for baryonic mass, non-baryonic cold dark matter, and dark energy. There is an implicit (seventh) parameter for the density of neutrinos.

Now we also include the power spectrum as cosmological parameters (σ8, n). These did not use to be considered on the same level as the Big Two. They aren’t: they concern structure formation within the world model, not the nature of the world model. But I guess they seem more important once the Big Numbers are settled.

Here is a quick list of what we believed, then and now:


Paramater SCDM ΛCDM
H0 50 70
Ωm 1.0 0.3
Ωbh2 0.0125 0.02225
ΩΛ 0.7
σ8 0.5 0.8
n 1.0 0.96


There are a number of “lesser” parameters, like the optical depth to reionization. Plus, the index n can run, one can invoke scale dependent non-linear biasing (a rolling fudge factor for σ8), and people talk seriously about the time evolution of antigravity the dark energy equation of state.

From the late ’80s to the early ’00s, all of these parameters (excepting only n) changed by much more than their formal uncertainty or theoretical expectation. Even big bang nucleosynthesis – by far the most robustly constrained – suffered a doubling in the mass density of baryons. This should be embarrassing, but most cosmologists assert it as a great success while quietly sweeping the lithium problem under the carpet.

The only thing that hasn’t really changed is our belief in Cold Dark Matter. That’s not because it is more robust. It is because it is much harder to detect, let alone measure.

Two Numbers

Cosmology used to be called the hunt for two numbers. It was simple and elegant. Nowadays we need at least six. It is neither simple nor elegant. So how did we get here?

The two Big Numbers are, or at least up till the early-90s were, the Hubble constant H0 and the density parameter Ω. These told us Everything. Or so we thought.

The Hubble constant is the expansion rate of the universe. Not only does it tell us how fast the universe is expanding, it sets the size scale through the Hubble distance-velocity relation. Moreover, its inverse is the Hubble time – essentially the age of the universe. A Useful and Important Number. To seek to measure it was a noble endeavor into which much toil and treasure was invested. Getting this right was what the Hubble Space Telescope was built for.

The density parameter measures the amount of stuff in the universe. Until relatively recently, it was used exclusively to refer to the mass density – the amount of gravitating stuff normalized to the critical density. The critical density is the over/under point where there is enough gravity to counteract the expansion of the universe. If Ω < 1, there isn’t enough, and the universe will expand forever. If Ω > 1, there’s more than enough, and the universe will eventually stop expanding and collapse. It controls the fate of the universe.

Just two numbers controlled the size, age, and ultimate fate of the universe. The hunt was on.

Of course, the hunt had been on for a long time, ever since Hubble discovered that the universe was expanding. For the first fifty years it largely shrank, then settled into a double valued rut between two entrenched camps. Sandage and collaborators found H0 = 50 km/s/Mpc while de Vaucoulers found a value closer to 100 km/s/Mpc.

The exact age of the universe depends a little on Ω as well as the Hubble constant. If the universe is empty, there is no gravity to retard its expansion. The age of such a `coasting’ universe is just the inverse of the Hubble constant – about 10 Gyr (10 billion years) for H0 = 100 and 20 Gyr for H0 = 50. If instead the universe has the critical density Ω = 1, the age is just 2/3 of the coasting value.

The difference in age between empty and critical ages is not huge by cosmic standards, but it nevertheless played an important role in guiding our thinking. Stellar evolution places a constraint on the ages of the oldest stars. These are all around a Hubble time old. That’s good – it looks like the first stars formed near the beginning of the universe. But we can’t have stars that are older than the universe they live in.

In the 80s, a commonly quoted age for the oldest stars was about 18 Gyr. That’s too old for de Vaucoulers’s H0 = 100 – even if the universe is completely empty. Worse, Ω = 1 is the only natural scale in cosmology; it seemed to many like the most likely case – a case bolstered by the advent of Inflation. In that case, the universe could be at most 13 Gyr old, even adopting Sandage’s H0 = 50. It was easy to imagine that the ages of the oldest stars were off by that much (indeed, the modern number is closer to 12 Gyr) but not by a lot more: Ages < 10 Gyr with H0 = 100 were right out.

Hence we fell into a double trap. First, there was confirmation bias: the ages of stars led to a clear preference for who must be right about the Hubble constant. Then Inflation made a compelling (but entirely theoretical) case the Ω had to be exactly 1 – entirely in mass. (There was no cosmological constant in those days.  You were stupid to even consider that.) This put further pressure on the age problem. A paradigm emerged with Ω = 1 and H0 = 50.

There was a very strong current of opinion in the 80s that this had to be the case. Inflation demanded Ω = 1, in which case H0 = 50 was the only sensible possibility. You were stupid to think otherwise.

That was the attitude into which I was indoctrinated. I wouldn’t blame any particular person for this indoctrination; it was more of a communal group-think. But that is absolutely the attitude that reigned supreme in the physics departments of MIT and Princeton in the mid-80s.

I switched grad schools, having decided I wanted data. Actual observational data; hands on telescopes. When I arrived at the University of Michigan in 1987, I found a very different culture among the astronomers there. It was more open minded. Based on measurements that were current at the time, H0 was maybe 80 or so.

At first I rejected this heresy as obviously insane. But the approach was much more empirical. It would be wrong to say that it was uninformed by theoretical considerations. But it was also informed by a long tradition of things that must be so turning out to be just plain wrong.

Between 1987 and 1995, the value of the Big Numbers changed by amounts that were inconceivable. None of the things that must be so turned out to be correct. And yet now, two decades later, we are back to the new old status quo, where all the parameters are Known and Cannot Conceivably Change.

Feels like I’ve been here before.

Falsifiability and Persuadability in Science

There has been some debate of late over the role of falsifiability in science. Falsifiability is the philosophical notion advocated by Popper as an acid test to distinguish between ideas that are scientific and those that are not. In short, for a theory to be scientific, it has to be subject to falsification. It must make some prediction which, were it to fail, would irrevocably break it.

A good historical example is provided by the phases of Venus. In the geocentric cosmology of Ptolemy, Venus is always between the Earth and the Sun. Consequently, one should only observe a crescent Venus. In contrast, in the heliocentric cosmology of Copernicus, Venus can get to the other side of the sun, so we should see the full range of phases.


Galileo observed the full range of phases when he pointed his telescope at Venus. So: game over, right?

Well, yes and no. In the strict sense of falsifiability as advocated by Popper, yes, geocentrism was out. That didn’t preclude hybrid pseudo-solutions, like the Tychonic model. Worse, it didn’t convince everyone instantaneously – even among serious minded people not impeded by religious absolutism, this was just one piece of evidence to be weighed along with many others. One might have perfectly good reason to weigh other things more heavily. Only with the benefit of hindsight can we look back and say Nailed it!

Nevertheless, this is often taught to young scientists as an example of how it is suppose to work. And it should. Ellis & Silk make an eloquent defense of the ethic of falsifiability. I largely agree with them, even if they offer a few examples which I don’t think qualify. They were motivated to mount this defense in response to the case made against falsification by Sean Carroll.

Without commenting on the merits of either argument – both sides make good points – it occurs to me that these is also a human element. One of personality and proclivity, perhaps. It has been my experience that those most eager to throw Popper (and Occam’s razor) under the bus are the same people who fancy ornate and practically unfalsifiable theories.

The debate about standards is thus also a debate about the relative merit of ideas. Should more speculative ideas have equal standing with more traditional explanations? If you’re a conservative scientist, you say Absolutely not! If you like to engage in theoretical speculation, you say Hell yes! 

Clearly there is value to both attitudes. The more conservative attitude teaches to refrain from turning our theories into Rube Goldberg machines that look really neat but teach us nothing. (Many galaxy formation simulations are like this.) On the other hand, some speculation is absolutely necessary to progress. Indeed, sometimes the most outrageous seeming speculations lead to the most profound advances.

In short, our attitudes matter. There is no such thing as the perfectly objective scientist as portrayed by the boring character in a white lab coat. We are human, after all, and a range of attitudes has value.

In this context, it seems that there should be a value system among scientists that parallels the standard of falsifiability for theories. We shouldn’t just hold theories to this high standard. We should also hold ourselves to a comparably high standard.

I suggest that a scientist must be persuadable. Just as a theory should subject itself to testing and potential falsification, we, as scientists, should set a standard by which we would change our minds. We all have pet ideas and tend to defend them against contrary evidence. Sometimes that is the right thing to do, as the evidence is not always airtight, or can be interpreted in multiple ways.

But – at what point does the evidence become compelling enough that we are obliged to abandon our favorite ideas? It isn’t good enough that a theory is falsifiable. We have to admit when it has been falsified. In short, we should set a standard by which we could be persuaded that an idea we had previously believed was wrong.

What the standard should be depends on the topic – some matters are more settled than others, and require correspondingly more compelling evidence to overturn. The standard also depends on the individual: each of us has to judge how to weigh the various lines of evidence. But there needs to be some standard.

In my experience, there are many scientists who are not persuadable. They are not simply hostile to speculative ideas. They are hostile to empirical data that contradicts their pet ideas. Sadly, in many cases, they do not seem to be able to distinguish between data – what is a plain fact – and contrary ideas. One sees this in the “debate” on global warming all the time: solution aversion (we don’t want to stop burning oil!) leads to cognitive dissonance and the rejection of facts: we don’t want to believe that, so the data must be faulty.

Sadly, this sort of behavior is all too common among practicing scientists. I advocate that this be considered unscientific behavior. Just as a theory should be falsifiable, we must be persuadable.

It’d be nice if we could be civil about it too. Baby steps.