Double slit experiment in the Heisenberg picture
Luboš Motl, April 18, 2015
What we observe is not the Nature itself but the Nature exposed to our method of questioning.
Natural science does not simply describe and explain Nature; it is part of the interplay between Nature and ourselves.
The first gulp from the glass of natural sciences will turn you into an atheist. But at the bottom of the glass, God is waiting for you.
Werner Heisenberg
Even though wave mechanics was in no way the first or deepest formulation of quantum mechanics, it quickly became popular because the «wave function» looks like a classical wave and this fact makes it easier for the people to «visualize» what’s going on. This «advantage» is actually a disadvantage because the visualization leads the people to the totally incorrect concept that the state vector is a classical wave or object of a sort, which it’s not, and that it should be distinguished from (i.e. considered mutually exclusive with) nearby «similar» state vectors, which it shouldn’t, and the popularity of the Schrödinger picture «helps» the people to preserve their anti-quantum misconceptions.
The oldest picture of quantum mechanics, one behind the «matrix mechanics» formulation of quantum mechanics, is the Heisenberg picture. There is no evolving wave function. Instead, it’s the operators such as \(\hat L\) that evolve according to the Heisenberg equations
\[ i\hbar \frac{d\hat L}{dt} = [\hat L, \hat H] \]
which are effectively «just» the classical equations with extra hats on top of all physical quantities. And in the classical limit, the Heisenberg equations (and their solutions) reduce to their classical counterparts. All the expectation values etc. are the same as in Schrödinger’s picture; the two pictures differ by a time-dependent unitary transformation. The Heisenberg picture doesn’t even force you to distinguish pure states and mixed states. As we will see on the double-slit example, predictions are formulated via the expectation values of \(\langle \cdots \rangle\) of the observables at the final moment, and they may be written as function(al)s of similar expectation values from the initial moment – which may be interpreted both as those coming from a pure state or a mixed state.
The double slit experiment is the ultimate toy model of quantum mechanics. If you carefully think about this experiment, you may reveal all the secrets and logic of quantum mechanics – which allows you to understand everything else, too. At least Feynman liked to say so. It is rather clear how the results of this experiment are computed (and in particular, where the interference comes from) in the Feynman path integral approach to quantum mechanics; it is also easy to understand how we solve the problem in Schrödinger’s picture where a wave function is evolving in between the slits and the photographic plate.
Some time ago, I promised you to write a blog post extracting the predictions for the double slit experiment from the Heisenberg picture of quantum mechanics. Here is my first attempt to solve this task.
We consider an experiment in 2 spatial dimensions and one time; there is no \(z\) direction here. The plate is separated from the slits by the distance \(L\) in the \(y\) direction. The thin slits are separated by a small distance \(R\) in the \(x\) direction, so their coordinates are
\[ x = \pm \frac{R}{2}, \quad y = 0. \]
We want to calculate the probability density that the particle lands at
\[ x=S, \quad y=L \]
at the photographic plate. The particle makes it through the slits at \(t=0\). We measure the number of particles absorbed by the material around the slits at \(t=0\). We assume that no particle absorption was detected at \(t=0\) so we know that one particle is flying through the experiment.
At time \(t\gt 0\), the particle lands somewhere at the photographic plate. First of all, it is easy to calculate \(t\). The Heisenberg equations of motion tell us (i.e. can be easily solved)
\[ \hat y = \frac{\hat p_y}{m} t \]
which is just like the classical equation but with the hats. Because we only assume that the slits affect the \(x\) components of the particle’s position and momentum, we conclude that the momentum component \(p_y\) was already known before the particle entered the slits, at \(t\lt 0\). The particle reaches the photographic plate when \(y=L\) which implies that the time is
\[ t = \frac{Lm}{p_y}. \]
There’s some uncertainty about the initial momentum \(\hat p_y\); this will get translated to some uncertainty about \(t\). Unlike \(\hat p_y\), the quantity \(t\) is not an operator; even in the Heisenberg picture, \(t\) is primarily an independent variable that other things depend upon. But we may ask what is the moment of the absorption (by the photographic plate) that the apparatus will detect and record, and this is an operator because it’s a result of a measurement. Before a person observes the apparatus, the apparatus itself will be in a linear superposition of states with different values of the recorded \(\hat t\), and that distribution will simply mirror the distribution of \(\hat p_y\).
The previous paragraph or two had a simple purpose: to eliminate the \(y\)-\(p_y\) part of the double slit problem because it has nothing to do with its essence, namely with the interference pattern we want to study. What we really want to solve is the \(x\)-\(p_x\) part of the problem, in the direction of the separation of the slits. We want to trace how the operators evolve between \({\rm time}=0\), the moment when the particle makes it through the slits, and \({\rm time}=t\) (I hope that this seemingly tautological notation isn’t confusing) when it lands on the photographic plate. And we want to say lots of clarifying words about the right interpretation of all these things.
Great. You may imagine that we always work with the same and constant wave function
\[ \psi(x) = \frac{1}{\sqrt{2}} \left( \sqrt{\delta(x+R/2)} + \sqrt{\delta(x-R/2)} \right) \]
The particle has a 50% probability to be at \(x=-R/2\) and 50% to be at \(x=+R/2\) at \(t=0\). I included the square root of two and the square root of the delta-function to make the state normalized to one. (The square root of the delta-function is always a very unnatural beast, however. That’s why all the normalization issues below will be problematic and you’re invited to fix this bug and use a more kosher initial state: some extra «width of each slit» may have to be introduced.) To make it more comprehensible, all copied \(x\) in the equation above should be replaced by \(x_{t=0}\). Note that the state vector doesn’t depend on \(t\); we are working in the Heisenberg picture where the state vector is constant.
Instead, what is not constant are the operators. Between the slits and the photographic plate, we deal with a free particle. It means that
\[ \hat p_x = {\rm const}, \quad \hat x = \hat x_{t=0} + \frac{\hat p_x t}{m}. \]
The momentum in the \(x\) direction is conserved when the particle is freely moving in between the slits and the plates (but not at the moment when the particle travels through the slits). The second equation was easily obtained by integrating the Heisenberg differential equation for \(d\hat x / dt\). The form of all these equations is the same as in classical physics (with extra hats). Our resulting expression for \(\hat x\) looks just like the equation for a «straight trajectory» in the spacetime. But we must remember that there is no particular slope of the trajectory because \(\hat p_x\) is an operator – it is not a well-known classical object.
Because of the hats, the equation that looks like a «straight line» doesn’t contradict the existence of the interference pattern, as we are going to see.
Fine. We want to calculate the probability that at \(t=t\), the particle lands at \(x(t)=S\). Well, more precisely, the probability that it lands in the interval \((S,S+dS)\) is \(dS\) times the probability density \(\rho(S)\) that it lands around \(x(t)=S\). This density \(\rho(S)\) is the expectation value of the «not normalized projector» onto the state with \(x(t)=S\):
\[ \rho(S) = \bra\psi \hat P_S \ket\psi, \quad \hat P_S = \ket{x(t)=S}\bra{x(t)=S} \]
I wrote that the projection operator is not normalized because the product of these two operators \(P_S\cdot P_{S’}\) is equal to \(P_S\) times not the Kronecker symbol but the delta-function of \(S-S’\), OK? This not normalized projection operator may also be written using a delta-function,
\[ \hat P_S = \delta (\hat x — S). \]
The argument of the delta-function is an operator but it’s no problem, all these functions of operators are well-defined and may be evaluated in any basis (most simply, in the basis of eigenvectors). Note that if we integrate the operator \(\hat P_S\) over some interval of values of \(S\), we get a genuine projection operator whose eigenvalues are zero and one (meaning «particle didn’t land in the interval» and «particle did land in the interval», respectively). That’s why the expectation value of \(\int_A^B dS\,\hat P_S\) is the probability that the particle landed in \((A,B)\).
Great. The rest is a calculation. There are many ways to proceed and I’ve tried several of them, to be sure that I can still calculate. The most straightforward strategy seems to employ the formula for the delta-function
\[ \delta(T) = \int_{-\infty}^{+\infty} dQ\,\exp(2\pi i QT). \]
With our argument, we have
\[ \hat P_S = \delta(\hat x — S) = \int_{-\infty}^{+\infty} dQ\,\exp[2\pi i Q(\hat x — S)] \]
and \(Q\) has units of the inverse distance. Don’t forget that the probability density that should reveal the \(\cos(S)\)-style interference pattern is \(\langle \hat P_S\rangle\). Now, let us use our direct solution for \(\hat x\):
\[ \hat P_S = \int_{-\infty}^{+\infty} dQ\,\exp[2\pi i Q(\hat x_0 + \frac{\hat p t}{m} — S)] \]
I decided to simplify the notation a bit, \(\hat p\equiv \hat p_x\) – note that this particular operator is not changing with time (free particle). It’s not necessary but I chose to exploit the BCH-style formula
\[ \exp(A+B) = \exp(A) \exp(B) \exp(-[A,B]/2) \]
which holds whenever \([A,B]\) is a \(c\)-number (which commutes with everything else). This is somewhat helpful because the exponentials of \(\hat x_0\) and the exponentials of \(\hat p\) which don’t commute with each other are separated as different factors. With this formula (and I drop the limits on the \(Q\)-integral, to make things easier), we get
\[ \hat P_S = \int dQ \exp(-2\pi i QS+2\pi^2 i\hbar Q^2 t/m) \times \exp(2\pi i Q\hat x_0) \exp(2\pi i Q \hat p t / m) \]
On the first line, the \(Q^2t\)-based term comes from the BCH-formula and it knows the same wisdom as the spreading of the initial delta-function in the Schrödinger picture (like in the solution to the diffusion equation, but with the extra \(i\)). Our overall normalization of the state is problematic.
Only the exponentials on the second line of this operator identity contain operators as arguments. It follows that if we calculate the probability density \(\langle \hat P_S\rangle\), we only need to surround the second line of the right hand side by the brackets.
Before we complete the computation of \(\langle \hat P_S\rangle\), let me emphasize a few general words. So far, the outcome – and the general intermediate result of a Heisenberg-picture calculation – was to express the observables at the final moment as function(al)s of observables at the initial moment. The same thing was true in classical physics: the quantities at the final moment are function(al)s of the quantities at the initial moment (and functions of \(t\), too). The only new aspect of quantum mechanics (in the Heisenberg picture) is that the observables are non-commuting which makes the function(al)s different (from some viewpoint, they’re more complex; from another viewpoint, they are equally complicated, just different).
If the final moment observable (which we actually measure) were a function of the observables at the initial moment that we exactly knew (because we measured it at the initial moment), we could produce a «sure» prediction that this observable has a predictable value. This situation – the state vector is an eigenstate of the final moment observable (this description is OK in both pictures) – almost never occurs. We usually deal with a more general, non-eigenstate situation in which the final moment observables also depend on some unknown observables at the initial moment. At most, we may calculate the expectation values etc. The expectation value of the projection operators such as \(P_S\) are interpreted as probabilities (or, in the continuous case such as ours, probability densities).
Finally, let’s compute the interference pattern.
The probability density as a function of the \(x\)-position \(S\) is
\[ \langle\hat P_S\rangle = \int dQ \exp(-2\pi i QS+2\pi^2 i\hbar Q^2 t/m) \times \langle \exp(2\pi i Q\hat x_0) \exp(2\pi i Q \hat p t / m) \rangle \]
We need to compute an integral. The integrand involves an expectation value of some function of the initial state operators in the initial (or final, they’re the same) state vector. In classical physics, the «state» would be fully described by the values of things like \(x,p\), and it would be «trivial» to take functions of them. Here, the functions of the noncommuting observables are «harder» and we need the expectation value of such function(al)s in the initial state.
Note that in the second line
\[ \langle \exp(2\pi i Q\hat x_0) \exp(2\pi i Q \hat p t / m) \rangle \]
the second exponential is a special case the operator of the displacement \(\exp(i\hat p\cdot\Delta x / \hbar)\), so it shifts the wave function by
\[ \Delta x=\frac{2\pi Q t\hbar}{m} = \frac{2\pi Q\hbar L}{p_y} \]
After the initial wave function is shifted, the phases of its two \(\sqrt{\delta}\) pieces are changed by the first exponential in our expectation values, and then the inner product of this «shifted and rephased» wave function with the original wave function is computed. Calculations of such expectation values are always the same – in the Schrödinger, Heisenberg, or Dirac picture. The Heisenberg picture we study here differs by the absence of any «evolved» wave function in the calculation.
Fine. You may see that the inner product is only nonzero if \(\Delta x = \pm R\) or \(\Delta x=0\). This is equivalent to \(Q=\pm Rm/2\pi t\hbar = \pm Rp_y/2\pi L\hbar\) or \(Q=0\). In those cases, the inner product gives us a nonzero contribution due to a delta-function «click». It means that our integral over \(Q\) degenerates into the contributions from these three values of \(Q\). Yes, of course: the fact that there are several terms is how we get the interference. The \(Q=0\) contribution to \(\langle P_S\rangle\) simplifies to \(\langle 1 \rangle=1\) because all the arguments in the exponentials are zero.
Well, around \(Q=0\), we don’t actually have \(\delta(Q)\) in the integral. We literally have just one (not infinity) at \(Q=0\). This is a symptom of our normalization problem. If the initial state were a sum of two delta-functions, and not the square roots, this problem would go away and the integral over \(Q\) would «cancel» against the delta-function. This is a more sensible way to deal with the normalization issue – the other way is to choose a finite-width slit. I chose an initial state normalized to one because it’s probably easier for beginners to deal with and the overall normalization doesn’t affect the interference pattern that will be obtained, anyway. At positive \(t\), such a square-root-of-delta-function state would evolve into an «infinitesimal constant» we got here times a wave function with «infinitely many maxima and minima», as we will see below, so the norm of such a later state is still \(0\times \infty =1\), if you wish.
What about the \(Q=+Rm/2\pi t\) term? The exponential of \(\hat p\) etc. manages to move the peak at \(x=-R/2\) to one at \(x=+R/2\). But the \(\hat x_0\)-based exponential adds different phases to these two peaks. It’s the relative phase that matters and it gives us \(\exp(2\pi i QR)\). There is a factor of \(1/2=(1/\sqrt 2)(1/\sqrt 2)\), too. Unlike the previous paragraph, this factor of \(1/2\) is not doubled.
In this relative phase, \(\exp(2\pi i QR)\), the exponent proportional to \(QR\) is much smaller than the exponent \(QS\) from the first part of the first line, and we will neglect it. We neglect the second term on the first line for the same reason. You’re invited to write down the exact results; precision may be needed to recover the normalization of the wave function (a finite number of interference peaks).
This value of \(Q\) mainly contributes \(\exp(-2\pi i QS)\) from the first factor on the first line.
Similarly, for \(Q=-Rm/2\pi t\), the relative phases get reverted and we get the multiplicative factor \(\exp(-2\pi i QR)/2\approx 1/2\). However, there’s still the important phase factor of \(\exp(-2\pi i QS)\) from the first line which has the opposite value of \(Q\) now. In combination with the previous term, we obtain a cosine because \(\exp(iV)/2+\exp(-iV)/2=\cos V\). Note that in the Heisenberg picture, we directly get the real probabilities or expectation values – no complex amplitudes. That’s also why we had three and not two values of \(Q\) that contributed. The argument of the cosine is
\[ V = 2\pi QS = \frac{RS m}{t\hbar} = \frac{RSp_y}{L\hbar} \]
You see that the \(Q=0\) contribution \(1\) is helpful because in combination with the cosine, it’s exactly needed to keep the sum non-negative. Because the probability density is
\[ 1+\cos(RS p_y / L\hbar)=1+\cos(2\pi RS/L\lambda) \]
where I introduced the de Broglie wavelength \(\lambda\), you may see that the separation between the minima (or maxima) is
\[ \Delta S= \frac{L\lambda}{R}, \]
the right result for any double-slit interference experiment that was already known to Thomas Young.
You are invited to fix all the neglected phases and normalization factors. First, I ask you to consider the non-normalizable initial state which has the delta-functions, and not their square roots. All the calculations will actually be identical as above. But you must realize that the later state will have infinitely many minima and maxima, so its norm will be infinite (just like the norm of the delta-function-based initial state).
You may also try to calculate the result for slits whose width is positive but you get much messier functions. If you do so, only a «couple» of interference minima and maxima near the center of the pattern will survive. You might also consider the problem in which \(\hat p_y\) isn’t quite conserved and/or decoupled from the \(x\)-component, but as far as I know, you won’t learn too much except for messier mathematical functions.
Collapse in the Heisenberg picture
So is there a collapse? When one observes the particle to land at \(x=S\), he simply learns that the projection operators \(\int dS’ \hat P_{S’}\) over intervals including \(S\) are equal to one. This newer measurement always «overwrites» the previous measurements and it must be used as the higher-priority basis for predictions of (even more) future observations.
At any rate, there is no time-dependent wave function in this picture so it’s nonsensical to talk about the «collapse» of such a wave function, too. The user of quantum mechanics is simply overwriting his knowledge about the observables – and/or their expectation values.
Heisenberg picture as «local hidden variables» producing violations of Bell’s inequality
You may be surprised that it works so well. The observables at the final moment may be written as function(al)s of the observables at the initial moment. And the expectation values of the relevant final moment observables produce the predictions for measurable expectation values (or for probabilities, if we compute expectation values of projection operators). In some sense, the Heisenberg-picture calculation was more rudimentary than the Schrödinger-picture computation that requires you to solve «complicated» partial differential equations. We only solved the equation \(x’=p\), found that \(x=pt+x_0\), and the rest was a straightforward integration! Feynman’s path integral approach is even more straightforward – all amplitudes are given by explicit (path) integrals.
Every observable may be represented by a «matrix» so you might have the following idea: Write down a Bell’s theorem experiment in terms of the Heisenberg-picture field operators. They are assigned to points in the spacetime, they evolve and interact locally, but you may always express them relatively to a basis whose first (or zeroth: I mean the initial) basis vector is the initial state.
In this setup, you may interpret the matrix elements \(P_{11}(t)\) of any projection operator as «classical observables» that directly predict the probabilities associated with the projection operator \(\hat P\), right? So isn’t it a violation of Bell’s theorem?
Well, it’s not because even though the Heisenberg-picture field operators are «local» in the physical sense – they know everything about the measurements you may do around the given spacetime point – they are not local in the Bell’s theorem sense because the relevant matrices are expressed relatively to the Hilbert space that «knows» about the possible states of the whole physical system (the whole Universe, if you wish).
The clever framework of quantum mechanics guarantees that all the predicted probabilities will always be unaffected by any changes made at spacelike-separated points. If you admit – and quantum mechanics urges you to admit – that the predicted probabilities of any outcome contain «everything that has any physical sense», you may derive that the whole theory, in our case, a quantum field theory, is a perfectly local physical theory.
On the other hand, it is not local by construction – I mean by the Bell’s realist construction assuming that all the probabilities come from the averaging of a classical deterministic system. Nevertheless, the Heisenberg picture shows you Nature’s cleverness: she can guarantee the perfect locality of all physically meaningful predictions even though the local operators are expressed relatively to states that «know about the whole Universe simultaneously» which looks non-local but within the quantum framework, it’s not.