Lack of surprises in discrete mathematics?
Luboš Motl, April 22, 2015
Surprises exist, are bound to become important, and therefore cannot be neglected
On his blog, in the announcement called «Is There Something Mysterious About Math?«, Scott Aaronson of MIT has announced a longer essay on the «Ideas.Aeon.CO» server
Q: Is there something mysterious about mathematics?
A: Why isn’t it more mysterious?
That text is sort of provocative.
First, Aaronson points out that it isn’t sufficiently fulfilling to fight a straw man because he is filled with the air. However, your humble correspondent provides him with a more attractive straw man – one that is filled with flesh and blood. As you know, many people just love to constantly fight straw men with flesh like myself – because such straw men are actually right and many people simply love to be wrong and to be proud about it.
The first general insight I am credited with is the following principle:
Discrete math is just a disorganized mess of random statements.
It was taken somewhere from my remarks about \(P\neq NP\) or \(P=NP\) and compactified. Note that I have repeatedly made the statements that «discrete math» doesn’t admit any partial evidence. If you have found neither a proof nor a disproof of a statement, you should remain about 50-50 open-minded about the validity of the claim.
One must be careful about what I mean by «discrete math», however. Roughly speaking, it must be a part of mathematics with no links to the continuous structures that are important in physics; and we must discuss questions such as \(P=NP\) which are unique and can’t be viewed as the 106th variation of problems that have been solved.
Scott Aaronson, an electrical engineer, starts by the question whether mathematics is mysterious. Well, there are lots of unsolved problems. They are usually propositions that may be right or wrong and we don’t have any rigorous proof in one way or another. However, in most cases, we actually do have some opinion «what is the likely answer» – it’s the answer that avoids the shocking surprises – and we just don’t possess a rigorous proof that the shocking surprise is indeed non-existent.
This is a fair summary I agree with – Aaronson’s overall picture isn’t stupid, at least not for a plumber at MIT.
So Fermat’s Last Theorem should have been believed to be «probably right» – because it’s been proven for many exponents and in similar situations and it would have been surprising if there were suddenly a (necessarily very large) exponent which would provide us with a counterexample. Later, the theorem was proven. The equal percentage of digits \(0,1,\dots 9\) in the decadic form of \(\pi\) hasn’t been rigorously proven but it seems to hold for quite some time, it seems to be a matter of common sense because \(\pi\) is a «random» sequence of digits, and an unequal representation of the digits would look like a giant conspiracy.
Similarly, there exist partial proofs of the Riemann Hypothesis. I am confident that it’s true although like others, I haven’t found a rigorous proof yet. Too many roots seem to lie on the critical line. Roots away from the line would mean some non-randomness or clumping of primes and it doesn’t seem to be there. If it were there, it would probably have manifested itself a long time ago but it didn’t.
However, there are mathematical propositions where one should be about 50-50 open-minded. Before some first sensible insights were made about the monstrous moonshine, the appearance of the number \(196,883+1\) at two very different corners of mathematics could have been a coincidence, or it could have had a reason. It is such a big number that the «coincidence» would require a random event whose probability is something like \(N/196,883\) where \(N\) is the number of «similarly fundamental» places in mathematics that produce an integer constant.
Of course, we’ve known for decades that it is no coincidence. The number appears at two very different places (an expansion of the elementary \(j\)-function of a complex variable, and a representation of the monster group) and a string-theoretical CFT proves why it is so. This insight also shapes our expectations about similar «moonshines». Of course, experts have generally believed that a similar explanation exists for similar coincidences, and some of the new ones have been found, too.
But I think it’s fair to say that \(P=NP\) is an example of a mathematical proposition that «doesn’t have any true precedent» which means that it is the unique problem «in its own class» and there is no «partial evidence». We are effectively asking about the existence of one algorithm that has a certain speed and that solves something that is effectively one particular problem (because proofs that this problem is complexity-wise equivalent to many others are known), let’s say the travelling salesman problem. Such an algorithm may exist (in which case it may be found in 2350, earlier, later, or never) and it doesn’t need to exist. So just like Walter Wagner said about the statement «LHC will destroy Earth» :-), the probability has to be 50-50.
Just to be sure that the smiley doesn’t get lost, I think that Walter Wagner revealed his idiocy because there exists overwhelming evidence – evidence at many different levels – that the LHC can’t destroy the Earth. But there is no analogous evidence relevant for the question whether the algorithm above exists.
Try to compare the existence of the fast algorithm for the travelling salesman problem with Fermat’s Last Theorem, for example. A good analogy could be that this hypothetical «fast algorithm» would be similar to a «freak exponent» for which there exists a counterexample to Fermat’s Last Theorem. But with this analogy in mind, the situations are not analogous at all. We have a proof of Fermat’s Last Theorem and even before we had it, we could eliminate all the possible candidates for the «freak exponents» up to some very large number.
But we just don’t have the analogous evidence in the \(P=NP\) case. Any future candidate algorithm that may solve the travelling salesman problem is a pretty special, clever structure. We know some algorithms how to solve this problem. All of them are «slow», in agreement with \(P\neq NP\). But have we looked at sufficiently many candidate algorithms to eliminate the possibility that there exists a more clever, parameterically faster algorithm? I don’t think so. Because all of the algorithms we know are basically the same – they are all about the brute force – maybe we should say that we have only tested one algorithm for the travelling salesman problem, and this candidate was «slow». It’s like having verified one exponent for which Fermat’s Last Theorem holds. It’s not a significant amount of evidence (especially because the exponents \(1,2\) allow infinitely many «counterexamples», so you could even argue that by the heuristic «induction», counterexamples should exist for all exponents \(3,4,5\) just like they exist for the exponents \(1,2\); exactly the converse holds). Before Fermat tried to intensely look for solutions to his equations with exponents greater than two, he should have believed that the probability of his «theorem’s» statement was comparable to 50%, and that’s exactly the situation in which \(P=NP\) is today.
OK, I have discussed this problem many times. Many wise experts in Aaronson’s field agree with me, many other, almost equally wise ones prefer their \(P\neq NP\) group think. Instead, I want to focus on a new twist in Aaronson’s essay – which may be summarized e.g. by his last sentence
But I feel confident in saying that, yes, there is something mysterious about math, and the main thing that’s mysterious is why there isn’t even more mystery than there is.
I think that I agree with this conclusion – but only if the «measure» (which decides whether the amount of mystery is high or not) is chosen in a certain way that isn’t terribly useful for the mathematics research. And I also think that the very general philosophy that Aaronson had to adopt before he came to the conclusion in the quote confirms my principle about the desirable open-mindedness concerning problems in «discrete math». Why?
Take Aaronson’s comments about the Riemann Hypothesis. He says that the alignment of the nontrivial zeroes on the \(i(s-1/2)\in\mathbb{R}\) axis is a «mysterious pattern» but it’s needed for an even more «mysterious pattern» – non-randomness and hierarchical clumping in the distribution of primes – to be absent.
Well, I think that the Riemann Hypothesis is probably true which also means that the \(i(s-1/2)\in\mathbb{R}\) pattern cannot be mysterious to me as long as I am rational. And if the Riemann Hypothesis is true, there is nothing mysterious about it. I may easily rationalize it. We can prove that all the nontrivial zeroes of the zeta-function are symmetrically distributed with respect to the critical axis. It means that if there were zeroes that violate the Riemann Hypothesis, they would have to come in pairs.
The imaginary part of \(s\) of a zero of the zeta-function is a «random real number». But if there were zeroes away from the critical axis, there would be at least one pair of zeroes whose imaginary parts are equal (and their real parts are also basically the same – the distances from the real axis coincide). This equality (or these equalities) would be «non-mysterious» from the functional viewpoint: we may prove the symmetry formula for the zeta-function. But in the perspective involving number theory (the distribution of primes), the equality of the imaginary (and real) parts of two zeroes would look like the equality between two random real numbers which seems infinitely unlikely.
For those reasons, it seems comprehensible to me that all the nontrivial roots may prefer to stay on the critical axis.
But I didn’t really want to end up with this conclusion. My point is that if you don’t think rationally or creatively enough, the alignment of the zeroes on the critical axis may look like a miraculous pattern to you. The expectation that such an alignment along the critical axis shouldn’t exist may be based on «vague analogies» with other problems that you know.
As our understanding of the mathematical structures deepen, we learn that some problems that were thought to be analogous differ in important ways, and may lead to the opposite answers; while other problems that were thought to be unrelated are actually deeply interconnected if not equivalent. We are learning. But unless we have a rigorous proof, it is always possible that there are «important exceptions» in the classes of structures and problems that we consider «analogous».
For certain reasons, I think that these problems are analogous to the problem of ethnic minorities. Let me explain why. Take some territory, e.g. Czechoslovakia of the 1930s to avoid «currently hot» disputes of the present. Most people in Czechoslovakia preferred to speak Czech or Slovak. But you shouldn’t overgeneralize: there was a rather well-defined, not too fractal territory, the Sudetenland, where most people preferred to speak German.
Your idea that «everyone speaks Czech» would be too inaccurate and if you were learning about the country, you would have learned about the German-speaking minority. But if you were learning about it even more than that, you would have figured out that «the Sudetenland speaks German» is inaccurate as well because there are still Czechs there – who are a minority within the population of the Sudetenland itself.
And so on.
My point is that in mathematics, there are subclasses of problems and structures that are «exceptions», and within those smaller classes, you find «exceptions of exceptions», and this chain may continue – in the most general case, it probably continues indefinitely. When you land at a not too well-defined but «not quite randomly chosen» point of the Sudetenland, your chances that you first meet a Czech speaker are about the same as the chances for a German speaker. You may run into a guy who is an «exception» in Czechoslovakia, or an «exception in the Sudetenland» i.e. an «exception for an exception in Czechoslovakia», and so on.
If you were given a truly «random» person in the Sudetenland of the 1930s, according to the uniform egalitarian distribution, it would make sense to say that «German» is more likely than «Czech». But there was no random generator with a uniform distribution. There could have been a good reason why you landed there. You may have been ordered by your Czech allies to deal with the Henlein Nazis, for example. In a similar way, \(P=NP\) was in no way chosen as a «truly random proposition» in a large class whose most members are known to be false propositions. \(P=NP\) is a special problem, one without any true precedents. To clump it with other problems is likely to be misleading.
I could speak for hours or days about the «exceptions» and «surprises» that people had to recognize in mathematics when they were learning things at an ever deeper level.
A simple example. Take \(\cos x\). It is a complicated, transcendental function of a sort. \(\cos x\) is almost certainly transcendental for a rational \(x=p/q\) if \(x\neq 0\). If the function can’t be calculated in some simple way, using fractions (or square roots etc.), it probably means that the probability is «infinitesimal» that \(\cos x\) will be a simple number.
But you could expect the same for \(\cos(\pi p/q)\), too. Only for \(x\) which is a multiple of \(\pi/2\), the cosine is either \(0\) or \(\pm 1\). But you may learn that \(\cos(\pi/4)=1/\sqrt{2}\). You view it as a single exception but then you learn about \(\cos \pi/3=1/2\) and in fact, infinitely many values of \(x\) in the period which are rational multiples of \(\pi\) may be expressed by similar simple square-root-based formulae.
These are simple things with a geometric interpretation and everyone who understands high school mathematics will find them understandable. Even though \(\cos(\pi p/q)\) may look like random numbers that have no reason to be of the form \(p_2/q_2+\sqrt{p_3/q_3}\), infinitely many such values of cosine exist. In this sense, the values of \(\cos x\) for «simple» values of \(x\) aren’t quite «random» numbers.
But there exist many much more abstract surprises or examples when experimental mathematics fails. Take
\[ v=\sin (\pi\cdot \exp (\pi\cdot \sqrt{163})) \] from the blog post Is Nature natural? (and Some numerology). Now we have a sine of a multiple of \(\pi\) but the coefficient isn’t rational. Instead, it’s the exponential of some crazy \(\pi\) multiple of a square root, one of \(163\), to make things worse.
It looks like some absolutely random gibberish. The exponential – the coefficient in the sine – may be proven not to be rational. So you expect the number \(v\) to be of order one. Your calculator says
\[ v\approx 0.00000\,00000\,02356 \] which is so close to zero that you will be tempted to believe that it is a numerical error and the expression really is zero. It is not, however. The probability that a sine – a «random» number between \(-1\) and \(+1\) – is so close to zero is comparable to the value \(v\) itself, about one part in a trillion. But while the expression for \(v\) was complicated, you won’t quite find a «trillion» of similarly simple expressions. So it’s surprising that we could get so close to zero.
The \(j\)-function – the same function that enters the \(196,883\)-based moonshine – may be used to explain why the number \(v\) is so small, i.e. why the coefficient given by the exponential is «almost» integer.
Once you learn about these cool patterns in the \(j\)-function, you realize (at least you should realize) that \(196,883\) isn’t quite a random number between \(100,000\) and \(1,000,000\); that’s a lesson from the monstrous moonshine. This number is much more tricky and is capable of deceiving you. Similarly, \(\sqrt{163}\) isn’t a random square root, either.
There is a heuristic explanation why \(\exp(\pi\sqrt{163})\) is nearly an integer. We know another number of this sort,
\[ \exp(\pi\sqrt{-1})=-1 \] which is an exact integer. If you replace \(163\) by \(-1\), the square root is \(i\), the imaginary unit, and the exponential of \(\pi i\) gives you minus one, according to the formula that Feynman once called the prettiest identity in mathematics. With the number \(163\) replacing \(-1\), it no longer works «exactly», but you get «close» to an integer.
Much of mathematics – and its progress – is all about similar «exceptional structures and patterns» that were not previously known. Because there’s something special about them, it makes sense for mathematicians to spend much more time by studying them. After some time, within the mathematical literature, these exceptional structures won’t look so exceptional, after all. They may even conquer a majority of the literature.
The «equality» of all the numbers etc. is broken because «something special» holds for special numbers or special functions or special mathematical structures of many kinds. And it is this «something special» that makes these «exceptions» particularly attractive for mathematics.
We may look at \(P=NP\) from the viewpoint of the following analogy. Imagine that «the travelling salesman problem» is compared to the number of \(163\) because it is, for example, the \(163\)rd most important problem in computer science. It sounds about right. The question is whether the fastest algorithm for this problem is polynomial. This question may be analogous to the question whether the time (expressed in billions of years)
\[ v= \sin[\pi\exp(\pi\sqrt{163})] \] is shorter than a human life. Well, it’s in billions of years and this is a random \(O(1)\) multiple, so it can’t be shorter, can it? However, it is shorter. With my units, it would only take hours to solve the travelling salesman problem because the problem isn’t too random – much like the number \(163\), it may be described in a rather concise way – and lots of «special extra relationships and approximations» exist for small numbers including those similar to \(163\).
There may very well exist a polynomial algorithm for the travelling salesman problem, and to be «nearly certain» that there isn’t one is a similar kind of an unsubstantiated faith as the faith that numbers such as \(\sin[\pi\exp(\pi\sqrt{163})]\) can never be insanely small. Well, they can be, but it may often need some degree of depth, progress, and modern expertise to uncover all the patterns. When we get smarter, we may very well find \(P=NP\) as natural as \(v\ll 1\).
Most of the numbers and mathematical structures that mathematicians write about in their papers are «special» – they can be described by finite, and usually short enough, sequences of words or symbols. And their being «special» means that they can be subject to many special relationships, patterns, and laws. It is simply stupid (and a sign of a lazy Luddite) to assume otherwise – that no similar relationships or exceptional numbers or exceptional mathematical structures or fast algorithms or surprises in general will be found in the future.
The whole history of mathematics is the history of our increasingly refined knowledge of numbers and mathematical structures – and every refinement is about a «surprise» of a sort!
I wrote that people who believe that there are almost no surprises in mathematics are lazy – and Luddites who oppose progress – because they don’t want to find these surprises (even though they sometimes get huge salaries pretty much for looking for interesting things). But I think that people like Aaronson can’t really be familiar with sufficiently deep mathematics and its history, either, because they would know that surprises occur all the time and the history of mathematics is pretty much composed of them because a surprise is usually a more important development than a non-surprise!
Partial evidence that such surprising structures don’t exist is only conceivable once we solve many similar problems that really seem to be in the same «class» and that seem to be equally «random» members in the class. This is, pretty much by definition, never the case in the «discrete math» as quasi-defined at the beginning.