# Locality correct, realism incorrect: why

Luboš Motl, november 03, 2015

Many people are ambiguous on whether locality or realism are right because of Bell’s theorem:

Bell has showed that locality+realism imply inequalities that are violated in experiments (and that disagree with the predictions of quantum mechanics).

It follows that either realism is wrong; or locality is wrong.

These comments are correct but the people «deduce» that it’s about equally likely that locality is violated; or realism is violated. And most writers of the popular books even pick «locality» as the principle that has to be sacrificed.

But this way of thinking, especially the second one (but even the first one), is totally stupid. It’s exactly analogous to this reasoning about the composition of the Moon:

When we look at the Moon, it looks almost spherical. It follows that the Moon is either a flying lemon or a big rock that was made nearly spherical by the self-gravity. The Moon cannot be a banana.

Fair enough. The Moon isn’t a banana because the banana isn’t spherical (the curvature radius and aspect ratio of a banana is regulated by the European Union regulations so that the Latin American «bananas» aren’t «bananas» in the EU).

But what many people completely misunderstand is that the quote above isn’t «the only fact» or «the only argument» about the Moon that we know. It is just *an* argument or *a* fact. There are many other facts and arguments and those make it spectacularly clear that the Moon isn’t a flying lemon, either. It is a big piece of a rock that was made nearly spherical by the matter’s efforts to reduce the gravitational potential energy.

The case of Bell’s theorem is analogous. Bell’s theorem is just *a* result, and not a terribly universal one. But science has found something much more important and valuable than an inequality for a particular uninteresting experiment with spins. It has found the *right predictions* for virtually *all* experiments we can make. Predictions are propositions «the observed value will be equal to a calculated value». Predictions are equalities. And science can produce trillions of such equalities. An inequality is a much weaker result than an equality – and an incomparably weaker result than trillions of equalities.

**Why locality has to hold**

Locality means that when we observe the effects of an event or a choice at time \(t\) and place \((x,y,z)\) on other events at time \(t+dt\), only events at places close enough to \((x,y,z)\) may be influenced. In fact, the information can only propagate at most by the speed of light in the vacuum \(c\):

\[ | \Delta \vec r | \leq c\cdot dt \]

Why is it so? Even though it sounds intuitively plausible that we may only directly «push» or otherwise «influence» our vicinity (and this intuition was explicitly acknowledged as a valid assumption once people started to think in terms of fields), Newton’s law of gravity (and similarly Coulomb’s force etc.) was once believed to affect distant objects instantaneously.

Well, the locality follows from *causality* if this causality is reconciled with Einstein’s 1905 special theory of relativity. First, we must ask: What is causality?

Causality means that a choice – imagine a free choice by a human – done at time \(t_0\) may only influence events or results of measurements at time \(t_1\) if \(t_1\geq t_0\). In other words, causality is the principle that *the cause precedes its effect*. We roughly know it’s true because we can’t change what we did yesterday, among other things we would often love to change.

But can we be more explicit and localize the reason why causality has to hold?

Yes, we can. The issue is that everything that the laws of physics – either classical statistical physics or quantum mechanics (statistical or otherwise) – predict are *conditional* probabilities \(P(F|I)\). We know something, \(I\), about the state of the physical system, and the laws of physics manage to predict the probability that the physical system had, has, or will have another property \(F\).

Now, the funny thing is that \(P(F|I)\neq P(I|F)\). The probability of \(F\) assuming \(I\) isn’t the same thing as the probability of \(I\) assuming \(F\). This is a basic fact that Sean Carroll and similar morons meditating about the «mystery of the arrow of time» totally fail to understand. There are many ways to elaborate on this obvious point. For example, we may relate the two conditional probabilities by Bayes’ formula:

\[ P(I|F) = \frac{ P(I)P(F|I) }{ P(F) } \]

The two conditional probabilities are «almost» the same but they multiplicatively differ by the factor of \(P(I)/P(F)\). A problem is that \(P(I)\) is unavoidably subjective – the «prior probability» – so it *cannot* be quantified by any laws of physics.

It follows that only the numerical value of \(P(I|F)\) may be precisely predicted by the laws of physics, or only \(P(F|I)\), but not both! It is surely clear to you why I picked the letters \(I,F\) in my notation: they are properties of the initial and final states, respectively. And the laws of physics only predict \(P(F|I)\), the probability of a particular final state given some information about the initial state.

When the laws of physics do so – when they predict the future – they simply cannot unambiguously «retrodict» the past. Retrodictions are a form of a reverse engineering. You might think that if you specify the timing for both \(I\) and \(F\), both \(P(F|I)\) and \(P(I|F)\) mean the same proposition: some spacetime that has the properties \(I,F\) at the two times. But that simply cannot be the case because the conditional probabilities are always asymmetric, as I discussed above.

Retrodictions are «hard» and «less objective» because of irreversibility. When you see an already cold soup on the table on the street, it’s hard to reconstruct what happened before. When was it cooked? The answers to questions about the past clearly depend on your assumptions about many vague things – how much time do you need to find someone hungry enough to eat a soup on the street? The answers cannot be determined by the laws of physics controlling the soup itself.

There is another microscopic way to see the source of the asymmetry. Imagine that \(I,F\) are «ensembles» of microstates \(\psi_i\) and \(\psi_f\) – points in the phase space classically; or pure states in the Hilbert space quantum mechanically – at the initial and final moments. Then, the probability of the «macro evolution» is, as I have often written on this blog, equal to:

\[ P(I\to F) = \sum_i P(\psi_i) \sum_f P(\psi_i\to \psi_f) \] We are summing both over initial and final microstates – and the micro-transition probabilities \(P(\psi_i\to \psi_f)\) may be T-symmetric (\(i\leftrightarrow f\)) or at least CPT-symmetric, indeed – but we must include the prior probabilities \(P(\psi_i)\) for the particular initial microstates. Note that nothing like this factor appears for the final microstates. The probability of \(\psi_{f1}\) or \(\psi_{f2}\) as the final state is simply the sum of the two probabilities. But the prior probabilities for \(\psi_{i1}\) and \(\psi_{i2}\) have to be included and shared because we must guarantee that the sum of probabilities of mutually exclusive assumptions is \(1\). The insertion of \(P(\psi_i)\) converts the summation over the initial microstates to a weighted average over them. These factors are the prior probabilities, the weights.

As you have heard many times from me, the «ensemble» probability is a sum over the final microstates, but the *average* over the initial ones. In the simplest scenario, we assume that all the \(N_i\) initial microstates are equally likely. In that case, \(P(\psi_i) = 1/N_i\) and the previous formula reduces to

\[ P(I\to F) = \frac{1}{N_i} \sum_{i,f} P(\psi_i\to \psi_f). \] It looks «almost» symmetric in \(I\) and \(F\) except for a detail. There is a factor of \(1/N_i\) but no factor \(1/N_f\). The formula to compute «ensemble probabilities» – the typical form of general conditional probabilities we need all the time – simply treats \(I\) and \(F\) differently. This asymmetry is the general reason for the second law of thermodynamics: the probability from a lower \(N_i\) (exponential of entropy) to a higher \(N_f\) is simply vastly larger – by a factor of \(N_f/N_i=\exp[(S_f-S_i)/k]\) – than the opposite one. Only processes with an increasing entropy may occur with probabilities that are not negligible.

The laws of physics must be told which of the properties is the «initial one», and call it the right \(I\), and which of them is the final state \(F\). Without this arrow of time i.e. without the chronology that may order causally related events i.e. distinguish the cause from the effect, the laws of physics simply don’t allow you to calculate general enough probabilities. Without this information, you wouldn’t know whether you should include the factor of \(1/N_i\) or \(1/N_f\). I haven’t used any specific properties of the laws of physics at all. What I have used is pure logic and/or the most general probability calculus.

**Relativity enters the scene**

So the previous paragraphs show that the generic time machines are inconsistent with any laws of physics that may predict any probabilities. You can’t travel to the past. You can’t have objects that co-exist, mutually interact, and «perceive» the opposite arrow of time, either. There has to exist a consistent arrow of time in each piece of the spacetime.

What happens when you combine this «need to sort the cause and the effect» with Einstein’s special theory of relativity? The requirement of causality strengthens. The reason is the *relativity of simultaneity of events*. Pairs of events \(I,F\) may sometimes obey \(t_I\leq t_F\) but that may still imply \(t’_F \leq t’_I\) in another inertial system. The ordering of \(I,F\) may sometimes depend on the observer.

It’s easy to see when this ambiguity arises. It arises if the spacetime points \(I,F\) are spacelike-separated! Indeed, when the separation is time-like (or null), the chronology is the same for all boosted observers. But when \(I,F\) are spacelike separated, there exist hyperplanes that contain both \(I\) and \(F\) and there exists an observer \(O\) for whom such a hyperplane may be interpreted as the moment \(t={\rm const}\). Because of the relativity of simultaneity, nearby observers «slightly» boosted relatively to \(O\) will either conclude that \(I\) took place before \(F\) or vice versa.

Relativity demands that all inertial observers will be able to use the same laws of physics to describe their observations. In the previous section, I have proven that any theory predicting probabilities must be able to distinguish the cause and effect. Because of transitivity, this chronology has to be equivalent with the chronology associated with the attachment of some timing labels \(t\).

But now, we saw that the chronology has to be independent of the choice of the observer, otherwise the probabilities would have to abruptly change if you made those infinitesimal boosts that place \(I\) in front of \(F\) or vice versa. It follows that whenever \(I\) and \(F\) are the cause and its effect, their spacetime points must have a separation admitting the same chronological ordering from all inertial frames’ perspectives. In other words, \(I\) and \(F\) must never be spacelike-separated when \(I\) and \(F\) are the cause and its effect.

**In other words, relativity combined with the basic logic at the top implies that the cause and its effect have to be timelike-separated. The maximum speed by which the influence or information of any kind may propagate is \(c\), the speed of light in the vacuum.**

Assuming the principle of relativity (or the Lorentz symmetry), we have proven locality from causality! And the causality principle was proven at the top from the asymmetry of the conditional probabilities.

One aspect of scientific irrationality is the temptation to ignore rigorously proven results a few seconds later. People sometimes invent silly excuses that have nothing to do with the arguments in order to believe that a proven result may be immediately ignored. So I must emphasize that the conclusion about the locality – a conclusion that Einstein understood in 1905 – doesn’t depend on any detailed properties of the laws of physics. Einstein’s principle is indeed a meta-law of physics that constrains not just some particular phenomena but *all phenomena* and *all conceivable laws of physics* meant to describe these phenomena. In particular, it applies to theories obeying quantum mechanics, too. There is absolutely no reason why it shouldn’t. These constraints – relativity and quantum mechanics – are logically independent.

Just recall that Einstein «deduced» the principle of relativity from the fact that in the train whose velocity is constant in time, you can’t figure out whether it’s moving or it’s the train next to you that is moving. This observation is a general fact – a symmetry – that holds or exists regardless of the possibly complicated processes that take place inside the train. For example, the train is burning coal and coal is composed of atoms that behave according to the laws of quantum mechanics. But it’s still true that you can’t figure out whether the train is moving or it is at rest.

So for any theory, the principle of relativity implies – thanks to the proofs above – that the information can’t propagate faster than light. This constraint may be formulated by certain words and equations in classical physics; and somewhat different words and equations in quantum mechanics. But the principle of relativity constrains the laws of physics whether they are classical or quantum. And if someone found a third possible framework aside from the classical and quantum one, e.g. the extraterrestrially dialectic laws of physics, the principles of relativity and locality would still have to hold and ban a large subset of a priori possible extraterrestrially dialectic theories.

**Local laws are actually known and have passed all the tests**

I want to emphasize that the relativistic laws of physics that we know – in classical field theory and quantum field theory (and string theory that extends the latter as well) – obey locality. The information never propagates faster than light. Why?

Start with the wave equation:

\[ \left ( { \frac{1}{c^2} \frac{\partial^2}{\partial t^2} — \frac{\partial^2}{\partial x^2}

— \frac{\partial^2}{\partial y^2}

— \frac{\partial^2}{\partial z^2} } \right ) \Phi = 0. \] You should be able to check that the equation is Lorentz-covariant. In other words, it obeys Einstein’s principle of relativity. The box operator is nicely constructed out of the Minkowski metric tensor. The equation has the same form in other reference frames. But does it really mean that a bump on the field \(\Phi\) will stay inside the light cone if you evolve the field according to this equation?

Yes, it does. The initial variations of the field will never transgress the light cone because the free propagator is confined to the light cone. (Well, I should call it Green’s function at this moment, not a propagator.) The evolution of any initial field \(\Phi(x,y,z)\) at \(t=0\) may be obtained as a combination of the free propagators. And the free propagator looks like the \(3D\) delta-function at \(t=0\) and just happens to be zero outside the light cone. Note that for even and odd dimensions, the propagator is either nonzero only strictly on the light cone or it is nonzero in the whole interior of the light cone. But it never sends any information faster than light.

It works for the free scalar field. It also works for fields of other spins because those also obey the wave equation at short distances. Note that the Dirac operator squared is the wave operator, too. And Maxwell’s equations may be written as wave equations for the electromagnetic potential, at least in a certain gauge. And the faster-than-light transfer of information will remain valid even if you add some interactions and other terms such as the Klein-Gordon mass term. Why?

Because you may still study the behavior of the field at extremely small regions of the spacetime. And at very short distances, the mass term may be neglected relatively to the differential operators. In effect (imagine that you are numerically evolving the partial differential equations on a lattice), the wave operator evolves the field just like in the free case and respects the speed – the strict vanishing of the field outside the light cone. And the mass terms and similar (interaction…) terms only modify the value of the field at the places where it’s nonzero. The mass terms etc. never make the field nonzero if it were zero to start with.

(If one added terms with more than two derivatives, rather than less than two derivatives, the argument above could break down. Higher-derivative terms are a potential threat for locality, indeed. This is also manifested by the fact that the addition of higher-derivative terms requires new initial conditions for the higher time derivatives of the fields. Those are basically equivalent to new «ordinary» fields and the equivalent new fields usually turn out to be tachyons, fields with negative squared masses. Well, my argument that locality isn’t ruined by the addition of the mass terms stays valid for tachyonic masses as well, but only if you keep the initial conditions for the tachyons confined in the region, too. But the new «equivalent tachyons» that desribe a system with higher-derivative terms may be nonzero in a bigger region which is why the addition of higher-derivative terms to field equations sometimes cripples locality.)

**Quantum field theory**

A funny thing about quantum field theory is that all these arguments may be directly imported and exploited in quantum field theory, too. In the Heisenberg picture, the field operators are actually obeying the «same» equations as the classical fields. And it’s the picture that is much more useful to understand the locality in quantum field theory.

In the Heisenberg picture, the pure state \(\ket\psi\) or the density matrix \(\rho\) is constant, independent of time, and it’s the operators that are evolving according to the Heisenberg equations which are basically the classical equations with extra hats.

This will continue to preserve the locality because the field operators \(\hat\Phi(x,y,z)\) at the final moment may still be written as a linear combination of the operators at the initial time, with Green’s functions used as the coefficient, just like we could have found the solutions of the classical equations. And because Green’s functions, the coefficients, strictly vanish outside the light cone, the Heisenberg field operators are functions of the operators inside their past light cone only. They don’t depend on any spacelike-separated operators. The information never propagates faster than light.

I basically say that the same argument that we had in classical field theory may be applied in quantum field theory, too. Isn’t it cheating? Isn’t quantum mechanics «completely different» so that nothing from classical physics works in quantum mechanics? Well, quantum mechanics is different in some respects but the same in other respects. Quantum mechanics changes the character of predictions – they have to be probabilistic – and the mathematical operations needed to make the predictions. They look nothing like the procedures in classical physics.

However, quantum mechanics does *not* change the fact that all the observables whose values may be extracted from the experiments are functions or functionals of the fields. It’s the quantum fields that are the «basic observables» in quantum field theory. And if nothing else than the observables (field operators) may be measured and if the field operators may be written as functions or functionals of some field operators in the past light cone, it follows that everything you can measure at some point – probabilities of one outcome or another – is encoded in facts about the measurements in the past light cone of this point.

People are trained not to use the original picture of quantum mechanics – the Heisenberg picture – at all so all these things may look difficult to them but if you throw away all irrational prejudices, the picture is crisp and clear. You may check e.g. the predictions of the double slit experiment in the Heisenberg picture. In that case as well as in the present discussion about locality, the operators \(\Phi(x,y,z,t_F)\) at the final moment \(t_F\) may be written as functionals of those at the initial time (at all points of the initial slice):

\[ \Phi(x,y,z,t_F) = F[ \Phi(x’,y’,z’,t_I) ] \] But once you understand that a functional like that exists, that this equation is literally true (the functional is a *solution* to the Heisenberg differential equations), it’s not hard to understand how the probabilistic predictions are made. Note that \(\ket\psi\) or \(\rho\) are constant in the Heisenberg picture. All the probabilities or expectation values etc. are calculated by sandwiching some operators inside \(\bra\psi\dots \ket\psi\) or by taking the traces with the density matrix \({\rm Tr}(\dots\rho )\). But the dots are replaced by some functions or functionals of operators at the final moment (the final measurements) and those are functions or functionals of those at the initial slice.

However, we have seen that the functionals translating the «final slice» to the «initial slice» only use the field operators at the initial slice that are timelike (or null) separated. So all the probabilities and expectation values in the point or region \(R\) of the final slice may be ultimately calculated from expectation values of some operators in the region \(R’\) of the initial slice that only contains time-like (or at most null) separated points from the points inside \(R\).

**We have proven that all the predicted probabilities or expectation values performed in the region \(R\) at time \(t_F\) are independent of all the results of measurements done outside the past light cone of \(R\).**

All the predictable probabilities are independent of the spacelike-separated measurements. But in quantum mechanics, all properties of physical systems are determined by measurements, so we may say that all the predictable probabilities (and expectation values…) are independent of the properties of the spacelike-separated fields, too.

When you take Quantum Electrodynamics or the Standard Model (our current TONE, a Theory Of Nearly Everything) with their fields and the Heisenberg equations, you will get predictions that agree with all the observations that people or animals have ever made. Except for the 600GeV superpartners and 2,3,5TeV new gauge bosons that are going to be discovered by the LHC soon, of course.

On the other hand, there doesn’t exist any glimpse of a theory that would be comparably compatible with the experiments but that would deny locality and managed to violate Bell’s inequalities while preserving realism. Those theories just won’t work because they are non-quantum, non-local theories. Both adjectives are wrong because the right theory must be quantum and relativistic! If you constructed a classical non-local theory, it would have to be non-relativistic, too. Violations of relativity haven’t been seen but if the theory is fundamentally non-relativistic, because you would (incorrectly) think that the nonlocality is needed for the vigorous violations of Bell’s inequalities, then it is pretty much guaranteed that the violation of relativity in almost every experiment would be comparable to 100 percent. But no violation has been ever observed which is just an infinitely unlikely outcome for a theory that fundamentally violates relativity everywhere.

To believe that the confirmations of relativity are illusions and one should still work on «realist» i.e. classical theories is exactly much more ludicrous than to believe that all observations of a spherical Earth or evolution are illusions and the right theory has a flat Earth and creationism. Needless to say, such anti-quantum zealots love to imagine that they’re intelligent but they are not. They are worse than the flat Earthers because their pet belief contradicts a greater part of the empirical evidence – really all of it – and the probability that they could accidentally bypass all these disagreements is lower than the probability that the flat Earth or creationism will be proven right, after all.

If you have mastered enough physics and mathematics to understand similar ideas but you are still uncertain about the claim that locality is right but realism (i.e. the basic framework of classical physics) is not, you should try to read this blog post many times and think at the same moment because the blog post *does* contain the simple proofs that causality (plus the arrow of time), locality, and non-realism are absolutely needed in any laws of physics that are at least remotely plausible.