Why quantum mechanics has to be complex and linear…
Luboš Motl, August 14, 2012
…and why (near) energy eigenstates are the «most real» among equally allowed states in the Hilbert space
Many physics beginners, physics fans, armchair physicists, young undergraduate students, as well as assorted physics Nobel prize winners who used to be the best quantum mechanical practitioners in the world but who have already forgotten all the basic physics have a problem with some fundamental, rudimentary, and universal features of the laws of Nature, namely with the postulates of quantum mechanics (QM).
They are ready to tell you that they want to construct – or, in the more hopeless cases, they have already constructed – a theory that is able to do everything that quantum mechanics can but it only allows some preferred states to be «truly realized»; the superpositions are less real. Or they tell you that they may reproduce quantum mechanics even though they only deal with real i.e. non-complex superpositions.
Every single comment of this kind is totally childishly wrong, of course. The superposition principle – which says that every linear combination of two allowed states (e.g. initial conditions) is equally allowed – is a totally rudimentary principle of quantum mechanics. That’s also why Paul Dirac dedicated Section I/1 in his Principles of Quantum Mechanics to this insight.
Why real numbers aren’t good enough in QM
In the past, I have discussed why complex numbers are fundamental in physics several times. The readers were reminded of the important properties of complex numbers such as the fundamental theorem of algebra i.e. the existence of \(n\) roots of any \(n\)-th order polynomial with complex coefficients (it wouldn’t work if we demanded real solutions). Complex numbers are important even if one wants to find real solutions of a real polynomial (i.e. a cubic one) so we’re getting more than what we insert. Holomorphic (natural) functions of a complex variable have many important mathematical properties that turn complex numbers into useful if not essential tools, e.g. in the case of two-dimensional conformal field theories. In many of the applications, the complex numbers may be viewed as non-essential but very useful technical tricks.
But in this article, I want to focus on quantum mechanics where complex numbers are more than just useful tools; they are really crucial for the theory to work. So why must the wave functions (and therefore matrix elements of all observables) be allowed to be complex? Why do we have to allow complex superpositions? Let me begin with enumerating three fundamental laws in which we see the imaginary unit: Schrödinger’s equation, Feynman’s path integral, and Heisenberg’s «uncertainty principle» commutator.
Schrödinger’s equation
Fine. The time-dependent Schrödinger’s equation tells us that
\( i\hbar \frac{d|\psi\rangle}{dt} = \hat H | \psi \rangle \)The time derivative of the state vector (\(|\psi\ \rangle\)) is proportional to the action of the Hamiltonian on the same state vector. The coefficient is proportional to the tiny reduced Planck’s constant; that has to be the case because the wave function must oscillate very quickly for it to be undetectable in macroscopic situations.
But the coefficient also includes the factor of \(i\): it is pure imaginary. Why does it have to be pure imaginary? Well, it’s necessary to preserve the norm of the state vector. Let us calculate the time derivative of the norm, using the Leibniz’s rule for the derivative of the product \((uv)’=uv’+u’v\):
$$ \frac{d\langle\psi|\psi\rangle}{dt} = \frac{1}{i\hbar}\bra\psi \cdot \hat H \ket \psi+\frac{1}{(i\hbar)^*}\bra\psi \hat H \cdot \ket \psi = \dots $$
The first term, \(uv’\), was obtained by simply multiplying the original Schrödinger’s equation by \(\bra\psi\) from the left. The second term, \(u’v\), was obtained by multiplying the Hermitian conjugate of the original Schrödinger’s equation by \(\ket\psi\) from the right. A funny thing is that the time derivative of the norm vanishes because the result is
$$\dots = 0$$
The two terms cancelled because the \(\cdot\) multiplication played no role – it’s still an ordinary multiplication of matrices which is associative – and because
$$i^* = -i$$
That’s great because the total probability wouldn’t be conserved if the imaginary unit were omitted: the wave function would exponentially increase (or exponentially decrease). Pure imaginary numbers are the only ones whose complex conjugates are equal to minus the original numbers and that’s exactly the virtue that we needed here. Also, if \(\ket\psi\) is an energy eigenstate whose eigenvalue is \(E\),
$$\hat H \ket \psi = E\ket\psi,$$
the time-dependent Schrödinger’s equation reduces to the time-independent Schrödinger’s equation or its solution
$$\ket{\psi(t)} = \exp\left(\frac{Et}{i\hbar}\right) \ket{\psi(0)}$$
That’s nice because the only thing that is oscillating is the phase. Note that the complex exponential \(\exp(i\omega t)\) allows us to distinguish positive frequencies (energies) and negative frequencies (energies), something that \(\cos(\omega t)\) or \(\sin(\omega t)\) wouldn’t be able to do. The same comment applies to many other dependencies of the wave function. For example, the plane wave \(\exp(+ipx/\hbar)\) differs from \(\exp(-ipx/\hbar)\) which is why it is able to distinguish a particle moving to the left with \(p\lt 0\) from a particle moving to the right with \(p\gt 0\).
So the imaginary unit is totally essential for Schrödinger’s equation to work. If you omitted it, you would get a totally different equation with a totally different behavior – it would be utterly ludicrous to claim that you are successfully imitating Schrödinger’s equation – and you would easily find out that none of such equations could accurately describe processes in Nature that depend on quantum mechanics.
Needless to say, the usual proof of the equivalence of the Schrödinger picture and the Heisenberg picture may be employed to show that for the same reason, the imaginary unit also has to be present in the Heisenberg equation of motion
$$i\hbar \frac{d\hat L}{dt} = [\hat L,\hat H]$$
I could also optimize arguments directly for the Heisenberg picture that wouldn’t depend on the Schödinger picture.
The path integral
Analogous comments apply to Feynman’s path integral,
$${\mathcal A}_{i\to f} = \int{\mathcal D}\phi\,\exp(iS[\phi]/\hbar)$$
Again, you see an imaginary unit in the complex exponent. This imaginary unit is totally essential because the absolute value of the weight of a history is always the same which is a good thing. In fact, this \(i\) is totally equivalent to the \(i\) in Schrödinger’s equation. When you are proving the path-integral formula for the evolution amplitude from the Hamiltonian evolution, you will ultimately use the relationship between the Hamiltonian and the Lagrangian
$$H + L = \sum_j \dot q_j p_j$$
Because there is an \(i\) in front of the Hamiltonian in the Schrödinger’s equation, there has to be an \(i\) in front of the action which is \(S=\int{dt L}\). Needless to say, the complex exponential in Schrödinger’s equation or Feynman’s path integral is also needed for the interference to exist; think about the double slit experiment. There are other «detailed situations» in which the key role of the imaginary unit may be seen. Also, one may easily argue that once the wave function for a subsystem is allowed to be complex, the wave function for the whole Universe has to be allowed to be complex, too – because in the «clustered» situations, the wave function of the whole system factorizes into the product of wave functions of the subsystems and the product of a complex number and a number from any other «field» is a complex number.
The commutators
Finally, let me discuss the commutator
$$\hat x \hat p — \hat p \hat x = i\hbar$$
There is a simple reason why the \(c\)-number on the right hand side has to be pure imaginary. The reason is that the left hand side is anti-Hermitian i.e. it obeys
$$(\hat x \hat p — \hat p \hat x)^\dagger = \hat p^\dagger \hat x^\dagger — \hat x^\dagger \hat p^\dagger = \hat p \hat x — \hat x \hat p = -(\hat x \hat p — \hat p \hat x)$$
The anti-Hermiticity means that the Hermitian conjugate of the left hand side is equal to minus the left hand side. We used the Hermiticity of \(\hat x,\hat p\) in the proof (it’s needed because they have real eigenvalues, the measured positions and momenta), aside from the identity \((AB)^\dagger = B^\dagger A^\dagger\). So the imaginary unit inevitably has to appear in this commutator and most other commutators.
As I mentioned, the commutator involving \(i\) is also needed for the \(\hat p\) eigenstates to have the form of the plane waves
$$\psi_p(x) \sim \exp(ipx / \hbar )$$
which don’t pick a specific place in space and which aren’t exponentially increasing or decreasing. Again, the sines and cosines wouldn’t be enough because they wouldn’t remember the sign of the momentum. I could add comments about dozens of other catastrophic failures that would follow from the omission of the imaginary unit.
Because the commutator \([\hat x,\hat p]\) is an operator \(i\hbar\) whose matrix entries are obviously complex, I mean non-real, it follows that it can’t be the case that all the matrix entries by \(\hat x,\hat p\) are real. At least one of them has to have complex entries. In the position representation and the momentum representation, one of the operators is given by a real matrix and the other one is given by a pure imaginary matrix. But in more general bases, the operators are given by matrices that are complex, none of them is real, and none of them is pure imaginary.
Spin-1/2 particles and complex spinors
The electron is the most famous particle whose spin is \(j=1/2\). The three components of the intrinsic angular momentum \(\hat S_x,\hat S_y,\hat S_z\) have eigenvalues \(\pm \hbar/2\) – just like any other observables, these observables are normalized to have real eigenvalues – and their commutators are
$$[\hat S_a,\hat S_b] = \sum_c i\hbar \epsilon_{abc} \hat S_c$$
The explanation of the imaginary unit \(i\) in the equation above is totally analogous to the explanation of the imaginary unit in the commutator \([\hat x,\hat p]\). Some previous blog entries tried to explain why are there spinors and what are their basic properties.
A funny thing about all two-component spinors – the normalized (to unity) column wave functions \(\ket\psi\) with two complex components – is that we may calculate a direction in the three-dimensional space
$$ \vec V = \bra\psi \vec\sigma \ket \psi$$
where the triplet of matrices sandwiched between the bra vectors and ket vectors is made out of the Pauli matrices. The electron whose spin wave function is \(\ket\psi\) is guaranteed to be spinning «up» with respect to the axis \(\vec V\). Note that \(\vec V\cdot \vec V = 1\) is guaranteed if \(\braket\psi\psi=1\). Up to the overall phase, the (normalized) state vector \(\ket\psi\) is uniquely determined for every (normalized) vector \(\vec V\).
Once again, the important properties above couldn’t be obeyed if you required the wave function to be real. For example, if you rotate \(\vec V\) by the angle \(\gamma\) around the \(z\)-axis, the components of the two-component wave function \(\ket\psi\) are multiplied by \(\exp(+i\gamma/2)\) and \(\exp(-i\gamma/2)\), respectively. If you banned complex components, you would ban rotations around the \(z\)-axis – and in fact, rotations around any other axis – as well. That would be too bad.
As we have already seen in several examples, complex numbers and phases are not optional luxuries in QM. Very basic and key transformations of any physical system – evolution in time, translation in space, rotation around an axis – are inevitably expressed by the change of the phase of the probability amplitudes (and, in more general bases, by unitary transformations). Moreover, the superposition principle holds; we may always mix two allowed wave functions into their sum. So physically vital operations force us to add and multiply the probability amplitudes according to the rules for complex numbers; up to a change of terminology or notation, the probability amplitudes are and have to be complex numbers! It’s the complex numbers \(z\in\mathbb{C}\) that are the natural values for the wave functions. Requiring them to be real i.e. any condition of the sort \(z=z^*\) would be unnatural (i.e. non-holomorphic) and would prevent the system from doing elementary operations such as rotations, translations, and evolution in time (systems wouldn’t be allowed to wait!).
We may design procedures to prepare the electron in any spin state, combine two components of its wave function into general complex superpositions, and verify that all the statements above hold. So all the complex combinations of any two allowed states must be allowed for the laws of physics to be rotationally invariant. So «something» that describes the electron’s spin and behaves as a two-component complex spinor has to exist; a set of simple direct experiments is enough to establish that this «something» has a probabilistic interpretation (the probabilities are given by the squared absolute values of the complex amplitudes). One may also see that this framework – the general framework of QM – is actually the only framework among a priori similar candidates that makes any sense.
If someone is telling you that he or she may reproduce all of quantum mechanics while demanding that all the wave functions are always real, you may be sure that he or she has always been or has become a confused amateur regardless of the number of the Nobel prizes he or she has received in the past.
Why (near) energy eigenstates are «somewhat more real» than other bases
Such people often tell you that the physical system is really allowed to be found in some specific basis vectors but the general complex superpositions are not allowed or less real. We have discussed two major examples showing that such an assumption is completely nonsensical.
If you allowed a position-related «preferred basis», it’s clear that particles could never move to the right (or to the left) if you banned the complex superpositions of your «preferred basis». In particular, the plane wave describing a particle that moves to the right has the form \(\exp(i p x / \hbar)\). As emphasized above, this function is complex (non-real) and has to be complex (non-real) for it to remember the direction of the motion.
The case of the spin is even more clear. You could pick a «preferred basis» for a spin-1/2 particle, e.g. the \(\ket{{\rm up}}\) and \(\ket{{\rm down}}\) states. However, those states pick a preferred axis in the space, the \(z\)-axis. The observations show that the laws of Nature are rotationally symmetric so you must be able to prepare corresponding states with respect to any other axis in space. As discussed above, the relevant wave functions (spinors) that are the eigenstates of the projection of spin with respect to a general axis are general complex linear superpositions of the \(\ket{{\rm up}}\) and \(\ket{{\rm down}}\) states. Real combinations only – or the ban on any combinations, if you want to make things really bad – would produce a sick theory that would brutally disagree with the rotational invariance of the laws of Nature.
As I have discussed in the text about the diversity of observables in quantum mechanics, there’s nothing special about the position eigenstates, of course. The eigenstates of other but equally good observables such as the velocity are complex linear superpositions of the position eigenstates. The wave function in the momentum representation is the Fourier transform of the wave function in the position representation and vice versa. It’s clearly wrong to say that one of them is «more real» than the other. All of the complex linear combinations of such state vectors must be allowed. They must be and they are as allowed as the state vectors you started with. That’s what the superposition principle of quantum mechanics means.
If you were looking for a basis that is «a little bit preferred», after all, you could find one. But it wouldn’t be composed of position eigenstates. Not even momentum eigenstates. It would be made out of energy eigenstates, i.e. the eigenstates of the Hamiltonian operator (or states that are close to them),
$$\hat H \ket \psi = E\ket\psi$$
Why? It’s because Nature tries to minimize the energy; and it’s also because these states are stationary (or close to be stationary) so they tend to conserve the «identity of the physical system». We may also say that the basis (or bases) of energy eigenstates are the only ones whose elements evolve to multiples of the same bases vectors. For all other bases you could think of, a general initial «basis vector» always evolves (via Schrödinger’s equation) into a linear superposition of several vectors. So even if you decided to ban all states except for basis vectors at \(t=0\), the general complex linear combinations would inevitably occur for almost any later value of \(t\); it is not really possible to «ban» general complex superpositions because the time is working against you. The energy eigenstates are the only counterexamples.
Consider a very slow proton and a very slow electron in a box. After some time, they approach each other and form a bound state, the Hydrogen atom. Chances are substantial that they immediately form the Hydrogen atom in the ground state and emit a (13.6eV) photon. If they produce an excited state of the Hydrogen atom, it will eventually emit a photon and fall to the ground state, anyway.
So the most relevant wave function of the relative position between the proton and the electron is the wave function for the ground state of the Hydrogen atom. I want to emphasize that this wave function is rather complicated if you express it in the position representation. For example, it exponentially decreases as you increase the distance between the proton and the electron (the \(e\)-folding distance is the Bohr radius). In other words, it is not one of the wave functions that someone could include into a «preferred basis» at the very beginning. It is very far from any element of the position eigenstate basis. The ground state depends on the Hamiltonian – on the dynamics – and requires you to make a non-trivial calculation to find out its «geometric shape».
But it’s the energy eigenstates, especially those with a low energy eigenvalue, that are most relevant in realistic situations. Especially when we talk about «very fast» degrees of freedom that have the potential to dramatically increase the energy (and there are infinitely many such degrees of freedom in a quantum field theory and even a higher number of them in string theory – e.g. particles i.e. excitations of a quantum field with a high momentum; or excitations of an individual string that turn it into a much heavier particle), Nature always tries to avoid such excessive energy increases. So most of such potentially high-energy-adding degrees of freedom are always described by the ground-state wave function, by an energy eigenstate. Such an eigenstate is totally different from any position-like «preferred basis» that someone may want to prescribe bureaucratically.
The total wave function at least morally follows the Ansatz of the Born-Oppenheimer approximation,
$$\ket{\psi_{\rm total}} \sim \ket{\psi_\text{slow, general}} \otimes \ket{\psi^0_\text{fast, ground state}}$$
and when we’re discussing the evolution of such a system, we simply ignore the last factor (the high-energy, short-distance inner structure of particles, for example) and deal with the first factor only. In chemistry, we ignore the possible internal excitation of the nuclei. Even if we work on nuclear physics or the quark-gluon plasma at RHIC, we ignore the possible additional internal excitations on the strings «inside» each quark and other degrees of freedom.
(The ground state wave function therefore depends on the Hamiltonian. The Hamiltonian is essential for many other «classical properties» of a physical system, too. For example, the Hamiltonian governs the process of decoherence and because decoherence may be viewed as a process picking a «somewhat preferred basis» for macroscopic objects, this «somewhat preferred basis» depends on the Hamiltonian, too. None of these «somewhat preferred state vectors» can be determined bureaucratically or kinematically, i.e. before we study the Hamiltonian and the evolution! The decohered states usually have sharp positions of macroscopic objects but that’s not because position is preferred from scratch; instead, it’s because the Hamiltonian is approximately or exactly local, i.e. an integral of a Hamiltonian density.)
Only when it comes to the degrees of freedom whose «most intrinsic» Hamiltonian term adds a low enough energy to the total Hamiltonian, much more general wave functions that differ from the energy eigenstates become «typical» states of the given physical system. And of course, some deviation of the state vector from any energy eigenstate is needed, otherwise Schrödinger’s equation would imply that the world is stationary and nothing ever changes about it. That would be bad, too.
But physical systems in the real world don’t want to be in position eigenstates (strict position eigenstates are not even normalizable and by the uncertainty principle, they carry a hugely undetermined momentum and therefore a divergent average kinetic energy). Instead, they want to minimize their energy so they tend to organize themselves into energy eigenstates corresponding to low eigenvalues of the energy. This point is really rudimentary and if someone thinks that it’s OK to assume that physical systems are «actually» (in the classical or «realist» sense) found in a position-like predetermined bureaucratic «preferred basis vectors», it means that they don’t understand that every physical system in Nature is actually doing something completely different and prefers to sit in one of the «complicated» energy eigenstates, probably the ground state, a state that no one could guess from the beginning (before she calculates anything that depends on the Hamiltonian).
The people who tell you that they may do or emulate quantum mechanics by banning complex linear combinations or even all linear combinations and people who tell you that the physical systems may be assumed to «objectively be» in one of the basis vectors of an easy-to-see, «kinematical» basis have completely lost it. They don’t understand basic physics; either they have never understood it or they don’t understand it anymore. We may feel compassion with these people but because the ignorance of basic modern physics doesn’t really «hurt», we shouldn’t exaggerate the compassion. So instead of excessive compassion, we should better protect our journals and websites against excessive flooding by these delusional folks.
And that’s the memo.