Celebrating the Heisenberg picture
Luboš Motl, December 19, 2011
As I suggested in the article about the solution of the Hydrogen atom using the \(SO(4)\) symmetry, one of the reasons why almost no people properly learn the «foundations of quantum mechanics» is the textbooks’ and teachers’ excessive focus on the Schrödinger picture (and the wave functions) and their disrespect for the Heisenberg picture (and the operators) even though operators – the observables – is what quantum mechanics is all about.
This unfortunate bias has its human dimension, too. Werner Heisenberg has been a much more brilliant physicist than e.g. Erwin Schrödinger. Heisenberg, unlike his wavy colleague, really understood what quantum mechanics is all about.
To discuss this point, I want to start with Heisenberg’s July 1925 paper about quantum mechanics, an article that has been described as the work of a magician. Many things were unclear in the paper and most readers fail to understand the reasoning behind the formulae but Heisenberg made some incredible correct calculations and, what is even more important, he remarkably predicted what physics would begin to revolve around.
Let me look at the paper that arguably sparked the most important conceptual revolution of modern science. It ultimately appeared in Zeitschrift für Physik in September 1925. The title was and the relevant URLs are:
Über quantentheoretische Umdeutung kinematischer und mechanischer Beziehungen
► Full text in German (free PDF)
► Guide to read the magic paper (arXiv 2004)
► Wikipedia (on the 1925 article)
► The book with an English translation (page 260)
► Full English translation (free PDF)
Quantum theoretical re-interpretation of kinematic and mechanical relations
Within two months after this paper, Heisenberg together with Pascual Jordan and Max Born managed to write another paper with a systematic treatment of (matrix) quantum mechanics where pretty much everything took the correct form.
The early paper above is different because there’s still some magic and layers of confusion in it but there can’t be doubt that Heisenberg understood a lot of new things. The spectrum of the harmonic oscillator including the zero-point energy; formulae for transition amplitudes; quantization rules for phase space cell whose area is \(2\pi\hbar\); Heisenberg equations for various systems; many other things. Everything’s there. Recall that he was just 24 when he went to the upper-class German (formerly Danish and British) island of Heligoland which is West of Hamburg.
Aside from igniting the most profound revolution of the 20th century science, he wanted to recover from hay fever. 😉 The island was also the place where he met Niels Bohr. His physics task was much more complex than the task that Einstein had to solve at the age of 26.
Because of the colleges’ focus on the wave function – which often leads the people to invalid conclusions – even the spectrum and behavior of the quantum harmonic oscillator fails to be among the «very early topics». However, Werner Heisenberg chose a much more complex topic to calculate in the first paper, before any other real quantum mechanics was available on the market: he calculated transitions in an anharmonic oscillator with a \(\lambda x^2\) force (a cubic potential, but he could talk about forces because Newton-like equations requiring no Hamiltonian are natural in the Heisenberg picture). He also discussed rigid rotators.
When I began to learn quantum mechanics at the high school, this paper was unreadable to me because I was really inexperienced with quantum physics. It would be much better today. Heisenberg was obviously able to sort much of the wisdom of the emerging quantum mechanics in his head; he was undoubtedly thinking about many very real physical systems; the pedagogical treatment wasn’t really accessible to most readers, however.
What I find important for Heisenberg’s understanding of the newborn quantum mechanics – and its systemic differences from any model in classical physics – is that Heisenberg often compares formulae as understood classically and quantum-theoretically (he actually uses these two very adjectives). Heisenberg’s main strategy is to exclusively talk about objects that are observable. It’s also important to keep in mind that quantum mechanics, despite its totally new features, still replaces and generalizes classical physics. So there have to exist – and there exist – objects that are calculable in classical physics as well as quantum physics.
Quantum physics just dares to give you totally new formulae and algorithms to calculate them, formulae and algorithms that would look blasphemous from any classical perspective. Schrödinger and others were not careful about the «actual role» of the newly discovered quantum concepts. Schrödinger himself focused on the equation – he was really trying to figure out how a helpful equation generalizing de Broglie’s wave to a nonzero potential could look like – but he was sloppy and wrong about its physical interpretation. Nope, the wave function doesn’t describe an electron that gets spread in the same way as butter.
Heisenberg, on the other hand, always started with the physics. His goal was to get the right physical predictions which guaranteed that he knew, from the very beginning, what’s the interpretation of the objects he was using, and he was using whatever maths was needed to obtain these new predictions.
Let me mention that the Heisenberg-vs-Schrödinger tension became personal, too. Heisenberg rightfully said the following about his «competitor in pictures»:
The more I think about the physical portion of Schrödinger’s theory, the more repulsive I find it. […] What Schrödinger writes about the visualisability of his theory ‘is probably not quite right,’ in other words it’s crap.
Just to be sure, the relationship was mutual. Schrödinger said the following unjustifiable sentence:
I knew of [Heisenberg’s] theory, of course, but I felt discouraged, not to say repelled, by the methods of transcendental algebra, which appeared difficult to me, and by the lack of visualisability.
OK, how does the world look in the Heisenberg picture?
In classical physics, observable quantities such as the positions and velocities are ordinary number-valued functions of time such as \(x(t)\), \(v(t)\), or fields \(E_z(x,y,z,t)\). The obey some equations, such as Newton’s equations or Maxwell’s equations. A well-defined value at the initial state may be used to deterministically predict the value in the final state.
In quantum mechanics, all these observables become operators or, when a particular discrete basis is used, matrices. Another remarkable fact is that Heisenberg, the father of matrix quantum mechanics, had clearly no official background in matrices; he didn’t really know what a matrix was. So in some sense, he had to rediscover matrix calculus much like Newton had to co-discover differential calculus. It shouldn’t shock us that the initial presentation was unavoidably far from perfectly comprehensible.
But these operators obey very similar equations. We just add hats. In principle, that’s everything we need because the expectation value of any function of operators at the final moment may be reduced to expectation values of operators in the initial moment. (Note that I don’t need to discuss whether the state is pure or mixed.)
And expectation values of (arbitrary) functions of operators are everything you need to reconstruct the whole probability distributions and to calculate probabilities of anything, including correlated outcomes of experiments.
Lorentz symmetry and locality
Another advantage of the Heisenberg picture is that it makes the Lorentz symmetry – and locality that follows from it – manifest whenever it’s true. In the Schrödinger picture, we may get distracted by a wave function associated with a particular «one instant» slice of the spacetime. It doesn’t look terribly Lorentz-invariant, does it?
However, the physical predictions of the Schrödinger picture can be shown to coincide with those of the Heisenberg picture. And the latter makes the Lorentz symmetry manifest.
Think about a particular quantum field theory, e.g. quantum electrodynamics. It has coupled Maxwell’s and Dirac’s equations for the electromagnetic and electron fields. They’re manifestly Lorentz-covariant. They nicely transform under the Lorentz transformation. The proof of this proposition is pretty much identical in quantum mechanics – in the Heisenberg picture – as it is in classical physics. What I mean are equations like
\[ \partial_\mu \hat F^{\mu\nu} = \hat J^\nu \] If the equation holds in one frame, it will hold in another frame because all objects transform as tensors (or spinors or spintensors) and if they vanish (or are equal to one another) in one coordinate system, they vanish (or be equal) in all coordinate systems.
This would actually not be enough to guarantee the Lorentz symmetry of the quantum theory. One must also guarantee that the commutators behave in a Lorentz-covariant way. But they do. At an initial slice, the only nonzero commutator – taking the Klein-Gordon field as a representative – is
\[ [ \phi(x,y,z;t), \partial_0 \phi(x’,y’,z’;t) ] = i\delta^{(3)}(\vec x — \vec x’) \]
It’s important that operators associated with points that are spacelike-separated (strictly) commute with each other. This is also necessary for locality: it’s always possible to perform independent measurements in spatially separated regions of the spacetime. The adjective «independent» doesn’t mean that the predicted probabilities (and, consequently, the experimental outcomes) are uncorrelated. They’re usually correlated. Instead, what the adjective «independent» means that one measurement doesn’t interfere with the other (like if you tried to measure both a position and a momentum of a particle). Consequently, these two measurements may be ordered in both ways and the results won’t be affected.
At the quantum level, loop corrections introduce terms in the effective action that may involve higher derivatives of the fields and that may combine into non-local functions of the fields. So the simple treatment above – naively assuming that the equations have to be identical to the simplest classical theories – may be insufficient. But believe me, it’s still true that one may construct a quantum theory so that the Lorentz symmetry holds to all orders (and beyond).
The commutator above may look slightly Lorentz-violating: the whole commutator picks a particular slice \(t=t’\) and the left hand side contains a particular normal derivative \(\partial_0\). However, there’s also another aspect of the commutator that also picks a frame, namely the 3-dimensional «spatial» delta-function on the right hand side. When you analyze the commutator, you will find out that these two properties of the commutator match and guarantee that the content of the rule is independent of the reference frame.
Understanding locality
Popular presentations of quantum physics – and even some of the unpopular, technical manuscripts – are full of Bohmian delusions about «nonlocality». However, the Heisenberg picture makes it very clear that there’s no nonlocality in relativistic models of quantum physics, namely in quantum field theories and string theory. Consider the Klein-Gordon example.
It satisfies something like the following:
\[ \partial^\mu\partial_\mu \hat \phi = -V'(\hat \phi) \] Neglect the hats for a moment. The box operator defines a wave equation. It is easy to show that if you modify the initial value of the field \(\phi\) in some region, the future values of \(\phi\) may only be modified in the future light cones of the points at the initial slice where you modified \(\phi\). This can be easily shown for the free wave equation: a \(\delta\)-function in the initial state evolves to something that is only nonzero inside the future light cone. And the equation is linear, so it holds for more complicated modifications of arbitrary initial states, too.
However, this «speed limit equal to \(c\)» also applies when the potential \(V(\phi)\) and more generally any non-derivative interactions are nonzero. How can you see that? You may imagine that you numerically solve the equations of motion. Each step \({\rm d}t\), you evolve your field a little bit according to the wave equation without any potential. Then you follow it with a step where you modify \(\phi\) according to the potential term. And then again, small evolution by the wave equation, small evolution by the potential or interaction terms, and so on. In both of these steps of each cycle, the speed by which the information is spreading is bounded by \(c\): in the potential (non-derivative) part of the process, the information isn’t spreading in space at all. So even with a nonzero potential, it’s guaranteed that such equations respect the speed limit equal to \(c\).
When we add the hats, nothing really changes about these arguments at all! We’re just evolving objects \(\hat \phi\) that are \((\infty\times\infty)\) matrices. But all their matrix entries obey the same wave equation in the case of \(V=0\). The only new thing is that a nonzero potential \(V\) induces a non-derivative interaction between many – all – matrix entries of these matrices. But it’s still true that the algebraic modifications needed to numerically account for the nonzero potential don’t change the maximum speed with which the operator-valued fields may evolve.
If you want to answer a question associated with a region \(R\) somewhere in space and sometime in the future, it simply can’t be affected by changes of the initial state in another region \(S\) which is spatially separated from \(R\). Why? Because by definition, all predictions about the region \(R\) are encoded in operators (and their expectation values) that are functions or functionals of the «basic» operators associated with the points \(P_i\in R\). But all these operators, as we have just argued, are only influenced by changes of the initial state in their past light cones. So the Heisenberg picture makes it very clear that the information can never spread faster than light. The probabilities of anything associated with a given region \(R\) are completely unaffected by any change that you would do in a spatially separated region \(S\), i.e. with a region from which you would need superluminal signals to influence \(R\).
In this discussion, I didn’t even mention any «states» because physics isn’t really about «states». Physics is about observables i.e. operators. If textbooks managed to completely avoid «states», much like Heisenberg did in his paper, people could perhaps fully avoid the temptation to imagine that some information has to be sent in between a pair of EPR entangled particles. The Heisenberg picture makes it very clear that no such information is being sent. The fields in \(R\) and \(S\) are as isolated from one another as they are in classical physics. They can’t influence each other. The measurements end up with outcomes that exhibit correlations but that’s not due to some influence of the operators in \(S\) on operators in \(R\). It’s because the matrix-valued values of all these operators in both regions where we measure the EPR particle depend on the properties of the initial state, so they’re correlated. The matrix entries are correlated in a similar way as the Bertlmann’s socks. Just the fact that the probabilities of outcomes are encoded in large matrices or operators, and not in objectively given \(c\)-numbers, guarantees that it’s not classical physics. It’s quantum physics, stupid, and Bell’s inequalities and other things that would hold classically are routinely violated.
Schools of QM: summary
What I wanted to stress was that Heisenberg appreciated the right interpretation of quantum mechanics nearly from the beginning. One may use the symbols such as \(\hat x\) and \(\hat p\) because they are represented by totally different mathematical objects than the classical variables: they’re noncommuting operators rather than \(c\)-numbers. But he should never forget that \(\hat x\) is the same thing as \(x\) physically. Because we need a more accurate theory that is valid in the microscopic world, \(x\) has to be replaced by \(\hat x\) in all equations of mechanics. However, it plays exactly the same role. We’re just replacing the motherboard calculating to predictions by a newer and more accurate one. But the old equations – Newton’s or Maxwell’s ones – must still be true at some approximation. And they are. The commutators must be small because they used to be zero. And indeed, they are small.
Classical physics allowed us to calculate probabilities of transitions etc. when we assumed some mixed state – given by a probability distribution. Quantum physics allows us to calculate the corresponding probabilities of transitions as well. Again, they should reduce to the classical ones in a certain classical limit whenever it exists. The quantitative, calculable things in classical physics have their counterparts. The only thing that quantum physics doesn’t allow us to do is to assume that classical physics is right, i.e. that there is a fundamentally «certain» (or even «deterministically evolving») objective reality beneath everything. In complex and generic enough situations, this philosophical assumption of classical physics was never useful for any calculations, anyway. And in quantum mechanics, it’s not only useless, it’s actually wrong.
Unfortunately, the schools often lead people to think that the wave function \(\psi(x,y,z,t)\) is what quantum mechanics is all about and it silently encourages the people to think that \(\psi(x,y,z,t)\) is kind of analogous to classical fields, e.g. to \(E_z(x,y,z,t)\). But this analogy is, despite the mathematical similarity of the information stored in the mathematical objects, physically invalid. If one is an actual physicist, and not a mindless person who just keeps on masturbating with a mathematical formalism he doesn’t understand, it’s much more important to respect the physical analogies between objects. And the right analogy is between \(x(t)\) and \(E_z(x,y,z,t)\): field theory is just a kind of mechanics with infinitely many degrees of freedom (several degrees of freedom per each point in space). And when it comes to the classical-quantum boundary, the right analogy is between \(x(t)\) and \(\hat x(t)\). These objects are subjects to different mathematical machineries but they are supposed to play the very same role physically. And one can’t understand modern physics unless he accepts the established fact that at the fundamental level, the quantum rules are correct and the classical rules have been irreversibly proven to be wrong.