I'd like to start our forum events by reminding us all of OpenAI's mission.


OpenAI's mission is to ensure that artificial general intelligence, AGI, by which we mean highly autonomous systems that outperform humans at most economically valuable work benefits all of humanity.


Tonight, we will be exploring topics related to the future of math and AI. Our guest of honor this evening is Terry Tao, a professor of mathematics at UCLA. His areas of research include harmonic analysis, PDE, combinatorics, and number theory. He has received a number of awards, including the Fields Medal in 2006. Since 2021, Tao also serves on the President's Council of Advisors on Science and Technology.


Ilya Sutskever is co-founder and chief scientist of OpenAI. He leads research at OpenAI and is one of the architects behind the GPT models. Prior to OpenAI, Ilya was co-inventor of AlexNet and of Sequence-to-Sequence Learning. He earned his PhD in computer science from the University of Toronto.


Daniel Selsom is a researcher at OpenAI. He completed his PhD at Stanford University in computer science in 2019 and has worked on many paradigms of AI reasoning, including interactive theorem proving, probabilistic programming, neural SAT solving, formal mathematics, synthetic data generation, and now large language models.


Yaku Pachoki is the principal of research at OpenAI, where he led training of GPT-4 and OpenAI-5. He has a PhD in optimization methods from Carnegie Mellon.


And last but not least, facilitating our conversation this evening is Mark Chen. Mark is the head of frontiers research at OpenAI, where he focuses on multimodal and reasoning research. He led the team that created DALE-2 and then the team that incorporated visual perception into GPT-4. As a research scientist, Mark led the development of Codex, ImageGPT, and contributed to GPT-3.


So without further delay, Mark, I will let you take it from here.


Cool. Thanks a lot, Natalie. So I'm really honored to be facilitating the discussion today. Before we begin, I just want to really quickly thank Natalie and Avital for putting this event together, to Terry, Jakob, Ilya, Dan for generously donating your time today, and to everyone in the audience for joining us. I actually see quite a few familiar faces from even before OpenAI, so from Olympiad winners, other math professors, I'm really glad AI can bring all of us together.


So for the first part of the panel discussion, I'll be running through a series of questions. And I'd like to preferentially give Terrence, Terry, sorry, the first stab at answering, but I hope that the panel will feel pretty conversational. So other panelists should feel free to jump in, you know, ask follow-ups and so forth. I'll do my best to keep time and cover ground.


So I think one fact that wasn't mentioned about Terry was that he's played around with GPT quite a bit in the past, and he's also written about some of his experiences doing so. So my first question is, how do you use ChatGPT in your day-to-day life, and are there any kind of particularly notable interactions that you've had with the model?


Okay. So I think like many mathematicians, once GPT came out, the first thing we did was we sort of vetted research questions from our own area of expertise. And the results are very, very mixed when you directly try to ask GPT to do what you are already expert at, because it does feel like asking like an undergraduate who has some superficial knowledge, and it's kind of extemporizing doing some stream of consciousness type answer. And sometimes it's correct, sometimes it's almost correct, sometimes it's nonsense.


I do find it's still useful for doing some mathematics, as well, in idea generation. So when I'm at the phase of tossing out possible things to try, if I'm out of ideas, I'm asking it, like it may give me a list of maybe eight suggestions, and four or five are kind of obvious things that I know are not going to work, two are rubbish, but like one or two are things that I didn't think about, and those are useful. You have to have the prior expertise to select those.


What I'm finding GPT is most useful for is actually my secondary tasks, like coding, organizing my bibliography, doing some really unusual latex type things, things that I would normally have to Google a lot and do a lot of trial and error. My own competency is just quite low, but still high enough to evaluate the output of the LLM. And that I'm finding really useful, and it's changing the way I do math. So I'm coding a lot more now, because the barrier to entry is so much lower. So yeah, it's changing my, it's really upskilling my secondary tasks the most, I would say.


Cool. I would love to get answers from the rest of the panel, too. Let's maybe go with Jakob first.


I think where I find GPT most useful is for, yeah, for secondary tasks, for analyzing data. I think in particular for plotting, I found it to be a phenomenal tool. I have not had very much success using it for idea generation yet.


Cool. How about you, Dan?


I use it heavily to find snippets of Python code, using various libraries and APIs. For me, that's a lifesaver. I don't use it that much for hard research problems or idea generation. But I use it for poems, short stories, jokes, and all sorts of leisure things.


Yeah, go ahead, Javier.


In my case, it would be primarily asking questions about historical events, or is such-and-such, like recently I was curious if a particular, like the causes of a particular disease, or the efficacy of a particular medication. Questions around broad, not like generally stuff which I don't know, where I don't have a particularly deep question, I just want to know the answer.


Cool. How do you find the accuracy of the answers then? I mean, if it's not your own area of knowledge, do you have to double-check them?


So, it is true that there is an element of risk involved. And from time to time, I do notice that in an area where I do know something will give me a confidently wrong answer. Like also translation between languages, or let's say, telling me the shared root of some words. So, like, I would maybe cross-examine it. But there is an element of risk, and I feel comfortable with it.


That's true. It feels like kind of there's a discovery process where people kind of become calibrated and kind of understand the level of risk and kind of what it's capable of.


Yeah. I find that certain people seem to be better at managing that. So, like people who teach small kids for a living, or teach undergraduates, or parents of small kids, seem to actually do well. I mean, I think they're familiar with the idea of an intelligent entity that makes a lot of mistakes.


I'm curious, Terry, could you say more about some example maybe of how you found GenGPT useful for idea generation?


Yeah, so just usually the very start of a problem. So I mean, a very specific problem I remember was that there was this identity I wanted to prove. And I had a couple of ideas. I wanted to try some asymptotic analysis and some other things. But just as an experiment, I asked, you know, pretend that you're a colleague and I'm asking you for advice, and just to suggest some ideas.


And so it's, why don't you try induction, some examples. So many of these suggestions were kind of generic, like any problem-solving advice in the method would try. But there was just one suggestion to use generating functions that I did. In retrospect, it was an obvious choice. It just had not occurred to me.


So I mean, just for covering more systematic ground of what to try. Sometimes you can think yourself into a rut as a human, you know, you reach for the first tool that you think of and you forget it's a rather obvious tool. So it's a sanity check.


Interesting. And it does feel like there's parallels there with kind of using it as a writing tool. You know, a lot of people will kind of hit a rut when they're writing and you want some way to break through that barrier, even if it's something that's kind of obvious in retrospect.


Sometimes if it's wrong, sometimes there'll be times when I've asked it how to approach a problem and it gives an answer that's obviously wrong. And that triggers something in my brain that says, no, you don't do that. You do this. Oh, I should do that. You know, like somehow, like there's a law on the internet called Cunningham's law that the fastest way to get a correct answer on the internet is to post an incorrect answer and someone will very helpfully correct you. So you can, sometimes you can leverage the inaccuracy of GPT to your advantage actually.


Yeah, that makes a lot of sense. I guess kind of zoning into the mathematical abilities of GPT, I'm curious how you've kind of found it along dimensions such as creativity. So I think you've alluded to a little bit of that in your, in your last answer, but also kind of insightfulness or accuracy, or even the ability to kind of calculate long chains of, you know, let's say inferences or implications.


So, accuracy is a mixed bag, long chain of calculations, possibly with the right prompting. I mean, it does really feel like, you know, guiding an undergraduate during office hours to work through a complex problem. If you break it down into individual steps and you correct the student at every step, you can guide the student to a complex problem, but you just let the student just run, you know, the student make a mistake and it will compound and they'll get hopelessly confused.


It does have uncanny pattern matching sometimes. So like you'll ask it a problem and it will somehow know that some other mathematical concept is relevant, but the way it brings it up is, is wrong. And, but, so, but then like if you then consult a more reliable source or what that, what the term being alluded to, you can, you can sometimes work out what, why the language world thought it was a connection. But you need some expertise of your own to, to do that.


So I feel like what these tools do is that they allow you to sort of stretch your skills by sort of one level beyond your, your, your normal range. So if you're a, if you're a complete novice, you can become like an advanced beginner. If you're a beginner, you can come intermediate, you can become an expert. Then most of what the LM does is not so useful, but every so often it plugs some gaps that in your own knowledge.


Makes sense. I think you returned to this point of kind of comparing it to an undergrad in some ways. I'm curious how you would compare it to, to an undergrad, let's say at UCLA, like, you know, in what ways is it better or worse or, yeah.


So it's certainly a lot broader, more broadly educated. It, yeah, at least the way that sort of default way that you interact, it's, it does feel like an undergraduate who is like at the board, a bit nervous, running a stream of consciousness, not really stepping back to think. I haven't really, yeah, I mean, maybe there's a mode that makes it much more reflective. And, but it's, I think, you know, when an undergraduate has a, has a shaky grasp on the subject, he or she will, will sort of revert to sort of an autopilot pattern matching type of approach, like make guesses based on if they vaguely remember in class and so forth. And so, so like the elements of a very polished advanced version of that mode of thinking, but that's only one mode of thinking that we use.


Yeah, it makes sense. I guess to the other panels, do you guys have any kind of different perspectives on the capability profile of GPT for mathematics? Cool. I guess, yeah, kind of returning to, you know, like the undergrad point. So I'm curious about kind of the role GPT models will have in education going forward. So, you know, like, can undergrads benefit from using this as a tool? Or, or is that already kind of at the, at the kind of advanced nearing expert category that you're, you're mentioning?


Yeah, so, okay, so, so most obviously, like, like many standard math undergraduate homework exercises cannot be solved more than just directly by entering it into GPT. They will make mistakes. So you still need some baseline of, of knowledge to be able to detect and fix them. But yeah, so, but as I said, it's, it's, it's, it's an upskiller in that way. I think the more encyclopedic students are using it as a teaching assistant, so not just asking for the answer directly, but asking to explain. I know there's experiments by the Khan Academy and some other places to actually explicitly set up a very well prompted models to, to serve as tutors, not to give the answer directly to a problem, but, but to make the student go step by step. Once you have much better integration with, with other mathematical tools, I think that'll become much more useful.


There are also efforts underway to, to use these AR tools to generate much more interactive textbooks that, or some presently, you know, so this, this is what it also combines with tools like, like, like formal proof verification. So if you maybe, if you have a textbook and the proofs have somehow all been placed in some very formal language, the AR could maybe convert that into a human readable text, which is where, like, if there's a step, which is a reasonable explanation, a student can click on it and it will expand it. And, and there'll be a chat bot that you can ask questions to sort of a virtual author of the text and so forth. So I think we're going to see much more intelligent teaching materials than we have now.


Yeah. Do you, do you know of any stories of kind of students using this already?


Well, everyone uses, I mean, they will all try it for their homework. I mean, this is, you know, at this point it's a lost cause to try to prohibit. You know, and I think we have to adapt and create, you know, so I think the type of homework questions we need to assign now are things like, here's a question and here's GPT's response to this question. It is wrong. Critique it, fix it. And, you know, we need to, to, to train our kids how to use this or, you know, or design a question that, design like an integral that GPT can't answer, but you can. You know, I think we need to sort of embrace the technology, but also really train how to deal with its limitations.


Makes a lot of sense. I think for the next question, I'd love...


To turn to our other panelists first for a perspective and then to go to Terry. So I'm curious for perspectives on when we think a novel theorem will be proved with LLM, like Chachupiti, both, you know, with a contribution alongside a human and just purely by itself.


And I'm curious for perspectives also on what kind of theorem this is likely to be. Yeah, maybe I'll call on Dan first.


There's quite a spectrum of how much it's contributing. It seems like there is a clear trend towards more people integrating it more deeply in their workflows. And so it seems very clear that it will play some, has a good chance of playing some role in most future discoveries of all kinds. As for a really autonomous role where it's posing the ideas and making the main leaps, I think it's not this generation and it's quite hard to predict.


Cool. Ilya or Jakob, any thoughts on this, like timelines or guesses?


It's hard to be... Go ahead.


I'll just say, I think it's hard to be sure, but wouldn't bet against deep learning progress.


Yeah, I think maybe the first place I would expect LLMs to contribute meaningfully to proof is problems where you need to check a lot of cases, but the cases are not very easily definable. So you cannot solve them with existing approaches, which I think resembles AlphaGo where it is similar to chess in some ways, but you cannot quite do the same brute force approach. I think that's where we could see these models excel first.


Cool. Yeah. I'm curious if this aligns with your intuition. I think there also feels like a lot of math these days becomes more about connecting two fields where there may not exist a very clear connection. I'm wondering if maybe GPT can help with some of that pattern matching or...


I think it can. I think we haven't discovered the right way to use it necessarily. I think it will be complimentary more than a direct replacement of human intelligence in many ways. So as I said, making connections. So I could see that some mathematician in one field said, let's say analysis proves a result and GPT says, you know, this result seems connected to this result in topology, in quantum field theory, other areas that you wouldn't think of. This is currently sort of what conferences are often about. You give a talk and some of the audience says, oh, what you're saying is connected to this. We're making these unexpected connections. I think there's a lot of future promising, say, automatic conjecture generation, that once you have big data sets of mathematical objects, you just get the AI to fall through them and discover all kinds of empirical laws or conjectures that then maybe some combination of humans and computers could then resolve.


It is already very good at mimicking existing arguments. So I was recently writing a blog post actually about mathematics and using a VS code, which has a GitHub co-pilot enabled. And so I had this integral and I was trying to compute this integral and estimate it. And I broke it into three pieces and I said, okay, piece one is I can estimate by this method. And I wrote, I tapped the method. And then co-pilot surprised me by supplying how to estimate the other two parts of the integral completely correctly, actually. And so it was like, you know, an extra two paragraphs of text. So I could see very, in the near future, a mathematician will do one, will write down in full detail one case of, or like solve one problem of a certain type. And then the AI can then solve the other 999 problems in the same class, almost automatically. And that's just a type of mathematics that we just can't do right now. You know, mathematics is still kind of a very bespoke, handcrafted, artisanal type of craft where each theorem is sort of lovingly handmade.


Yeah. It does seem like a common theme is, you know, like case work heavy kind of mathematics. So maybe, you know, like characterizing like all groups of a certain type or like, yeah, something like that.


Cool. I guess one question I have is, let's say we do make some more progress on the AI front, right? How does mathematics look when we get to a world where, let's say an automatic theorem prover could prove statements, you know, at a rate faster than humans? So maybe like an expert mathematician kind of working on a problem would take a month to a year and this kind of system could give us a proof within a day. Like how does that change mathematics and the landscape of mathematics?


Yeah, I think it changes the speed and scale of what we can do. I mean, it's happened in other fields, you know, sequencing a genome of a single animal used to be an entire PhD thesis, you know, and now you can do that in minutes. It doesn't mean that PhD students in genomics are out of work, you know, they do other types of science, maybe much more large scale things that they couldn't do before. You know, I mean, what mathematicians do has changed so much over the centuries, you know.


In the Middle Ages, you know, mathematicians were hired mostly to make calendars, you know, to predict, you know, when is Easter and when do we plant our crops? And then, you know, it was how do we navigate the globe? And then, you know, or, you know, in the early 20th century, how do you compute integrals? And how do you model a nuclear weapon or something? But, you know, all these tools, all these things we can now do more or less by automated tools. So mathematicians do other things, we do some more higher order tasks. So yeah, we won't be doing, when that happens, we won't, the math we do will not look like what we're doing now, but we will still call it math. I don't actually know what it will be like. I'm curious if the other panelists have some kind of vision of, you know, like a post-AGI scientific world or what that would look like.


Ilya, it looks like you unmuted.


I mean, I can go very, like we all, like very briefly. One thing that we've seen was what happened to chess. Chess used to be, you had those great chess players. Now the computer is the great chess player and it turned into a spectator sport. Maybe that's something that could happen. More kind of a, well, something like that.


I have a question for Terry, a follow-up question. How might radically accelerated progress in mathematics affect the world rather than how might it affect mathematics?


That's a good question. I could see it, I could see improved models and discover like new laws of nature. Like discovering new laws of nature is really hard right now. You know, you have to be like, you know, an Einstein level sort of, you know, to actually, you know, but because, you know, you can make a hypothesis and then you'd have to do a large scale experiment or a lot of theoretical calculation to predict sort of the consequences of your hypothesis. And then see if it's the data and then you have to go back and try again.


this type of process, you just hypothesis generation and testing. It's a combination of both sides of math and science that, if that gets automated, we could really accelerate all kinds of science and technology.


I think, you know, there are all these materials that we would love to have in the room, such as superconductors, or, or, or improved solar cells or whatever. And, you know, while we have some of the basic laws of chemistry and physics, in principle, to sort of understand when, what the material properties of various things are, you know, our modeling capability, we don't have the mathematics and all the supercompute to actually model that. But with AI, either through by some combination of theory, like actually proving things mathematically, and also just, you know, direct machine learning, trying to predict the properties without actually going through those principles, that I think will be quite transformative.


Cool. And just to piggyback on kind of Ilya's answer for a little bit, I'm curious what your intrinsic motivations to do math are, like, is it something about discovering new knowledge? Is it kind of more the joy of kind of going through the process? I'm curious what kind of makes you so, yeah.


Yeah, it's, it's both. I mean, I'm a firm believer in Ricardo's law of comparative advantage, you know, that you should, you know, you should try to contribute where you can do more good relative to other people. And, and so, you know, I mean, yeah, I can, you know, I can do math, I can do a few other things, I can program, for instance, but I am not as good a programmer as, as, as, as a professional programmer. So, and that, that I can, yeah, so, you know, I think I can, I can do comparatively. That's my strength. I certainly enjoy making, making connections. And I enjoy actually communicating mathematics to others. And, you know, I've, you know, I've trained many students and seeing them sort of develop over, over time, it has been a great pleasure.


Cool. Maybe also to follow up on Dan's question, I'm curious, if you if we only automated the fear improving part, so let's say we had an AI system that could prove in a minute, what you could reasonably prove in a year, a formal statement. How do you think that alone would would change the world?


See, okay, so yeah, the the influence on mathematics on the rest of science is a little indirect. So yeah, so sometimes other scientists, you know, like physicists and so forth, they sometimes sort of tease mathematicians that, you know, you're trying to prove something that you that we already know empirically to be true, like 99% confidence. Why do you care about making it 100% certain with proof? So sometimes there are surprises that sometimes mathematically, there's a, you discover that a certain thing, you know, there's a symmetry that breaks, there's some phenomena that everyone thought would never happen, and it does happen. And that can sometimes lead to an experiment of verification. I mean, it's also just a different way to probe. As I said, like, you know, there are ways to, to just use pure thought, use mathematics to, to understand the universe without experiment. A very classic example is Galileo. So, you know, Aristotle used to think that that heavy bodies fall faster than light bodies. And everyone just accepted this fact. Now, the story goes, you know, Galileo, the way we teach it, or in popular belief, you know, Galileo dropped like two, like a cannonball and a small ball off of a tile piece or something. But actually, he actually disproved this with a thought experiment. He said that supposedly, you have a small object, a light object is connected by a very light string. So it's a single object, so it'll fall at a single rate, and then you just disconnect the string. But if the string is negligible, it shouldn't actually have any effect on the motion. And just by pure thought, you can actually easily see that Aristotle's theory is inconsistent. So, you know, with possibly with really advanced, you know, maybe people will make theological fallacies. I mean, I think people often, you know, they make financial mistakes, because they don't understand probability or, or compound interest or, or very basic things. Yeah, I mean, maybe it's a little bit naive. I mean, there's so much rationality in the world. But in principle, you know, having a very rational, mathematically informed AI is an assistant to everybody could really help everyone's reasoning abilities.


Mm hmm. Yeah, I guess, to kind of expand on the formal theme, I'm curious what your current views in terms of, you know, how important are formal systems to to mathematics these days? And I think one follow up question that's, I think, a couple of us on the panel are curious about is, you know, like, does formal mathematics generalize to all of human reasoning? So you have a system that's just amazing at formal mathematics? How much, you know, does, does that kind of cover all of human reasoning? Or do you need something, you know, else, that's a little softer, or qualitative?


It is one mode of thinking, I think, it will help. Okay, so, like, you know, so the lifecycle was already captured, but somehow, they captured the intuitive pattern recognition type of reasoning, which is one mode of thinking. Formal rigorous thinking is another mode we have. And then there's like visual thinking, and then there's sort of emotional based thinking. I mean, I think there are other ways to, to make conclusions. So I haven't thought so much about, yeah, I mean, I think it's, formal reasoning is limited to mostly formal situations. But I think it has great promise in really complementing the biggest weakness of the current language models, which is the inaccuracy. But that if you force the LLM to generate output that has to pass through a proof verifier, and so you can only say that you're asking to do mathematics, you know, if you without the proof verifier, you can make mistakes. But if what, what the output generates has to be verified, and if it doesn't, it gets fed back to the LLM, and it doesn't reinforce the learning or whatever to, to, to train it out, then it becomes a much more useful system. And I think, conversely, like these formal proof verification languages, they're quite hard to use. I've been learning, I started picking up one of these languages just a few weeks ago, I just started learning it. And I just proved my first basic theorem, which was like, like, one equals one kind of level of prevent. It took me like an hour to get to the point where I can actually get off a syntax rack to do that. You know, there are experts, you know, people who have spent, you know, years using the languages, they can, they can verify things quite quickly. But like the average mathematician who can prove theorems in natural language, they cannot prove theorems in this formal proof assistance. But with the, with LLMs, there's really this big opportunity that a mathematician could explain a proof to an LLM in natural, in natural mathematical English. And the LLM will try its best to convert it line by line into a formal proof and come back to the human whenever there's a step that they need more clarification. And so I think you can see much more widespread adoption of formal proof methods. But conversely, as I said, these formal proof verifiers will also augment the LLMs quite a bit. And possibly, once you figure out how to do that, this may help more broadly solve the hallucination problem. I mean, you can't use a formal proof verifier to verify all types of


reasoning, but possibly some of the lessons are transferable.


Cool. Yeah. That's a very interesting perspective. Like I could be using the verification process and maybe there's some interplay between, you know, models and humans where models will go back to humans and ask for kind of expansions on key parts of the proof.


Yeah. Interactivity is the key.


Yeah.


Yeah. I guess you touched on a couple of things, you know, like modes of thinking, like there's kind of informal thinking and kind of a formal mode and maybe like a visual mode. And I was curious just to get some more insight onto kind of your thought process when attacking a complex math problem. Are you, do you like flip between these modes? Do you focus on one of these modes? Kind of when, can you kind of characterize like when you come up with insights?


This is a good question. It's actually, I don't really know. Part of the problem is that I'm not an external observer to my own thinking process. It's actually easier to see me see how a student thinks when I'm watching them on the board than to observe my own thinking. I am not a particular visual person myself, actually. I reason a lot by analogy, like I do a lot of translation. So I may be working on a problem which is written in algebraic form. And then at some point I realized it has a kind of geometric structure. And then so I should maybe translate everything into some geometric form. And so I do a lot of calculation. And then I have a geometric, I now have the problem expressed as a geometric problem. And then I just notice something else. Maybe there are many points in this configuration, but one of them is always in the center or near the center. And then that's somehow an important fact. So often like you just haven't noticed, just from experience and pattern matching, you notice that sort of one piece of a puzzle, one piece of a possible approach, and then you have to work to fill in the rest.


The process was a little bit like sometimes if you watch a lot of movies and TV shows, and then you watch a TV show that you haven't seen before, sometimes you can guess the plot or guess what's going to happen the next five minutes. Because you see someone come in and say, oh, probably this person is going to do something dramatic. Because you recognize some tropes. And then you can kind of fill in the next steps by yourself. So there is a lot of pattern matching, at least the way I do mathematics. Once something looks like a paradigm that I think I'm familiar with, then there's a lot of computation. I can almost never solve a problem in my head. I get an idea, then I have to work it out. Often I make a mistake and then I have to correct it. More recently, I sometimes test things on a computer. I have tried chatting with GPT during this process. I find, as I said, at the very initial stages of suggesting an idea, it's useful. After a while, the signal to noise ratio is not great. Once I start knowing how to do things myself, it's best to just do it.


Cool. Yeah, thanks a lot for that answer. I think one question we had also was in kind of defining AGI or ASI. And we're curious, what concrete milestones would make you think that we're close to achieving human-level reasoning? Is it something like we want to solve all the IMO problems or something like that?


Well, I mean, I don't need to tell you what the AI effect is, right? Any target you use, whether it's solving chess or image recognition or whatever, it's a good goal until it's solved. And then you realize it wasn't actually the goal. I think we don't understand what intelligence is. And I think one of the byproducts of AI research that is really quite exciting is that we actually learn a lot about what human intelligence is. And we don't have a definition. We sort of can recognize symptoms of intelligence, signals, but we don't recognize intelligence itself. So every benchmark, it's a good short-term goal to advance the subject forward. But I think it's definitely premature to say that we have the ultimate definition of intelligence and the ultimate benchmark.


Could you expand on that a little bit, just kind of like what kind of insights maybe into human intelligence have you had as a result of seeing the AI progress?


Well, so the fact that these language models can carry on a very human-sounding conversation and sound very natural, I think what it has told me is that that's something that humans can do in autopilot. And actually, as someone who's raised a small child, I think they already realize this, that a child can babble and actually be reasonably coherent without really having a deep understanding of what they're saying. And I think intelligence is to some extent a cognitive illusion. I mean, there are times when we really are thinking deeply and really using all of our mental capacity, but that's actually quite rare. I think for most of our daily tasks, we act on instinct and pattern recognition, and our brain fills in the gaps. And sometimes it provides retroactively rationales for why you did something that you did. And so I think we give ourselves the illusion that we're more intelligent than we are. I think it helps us keep us sane. But yeah, I mean, it's been surprising to me just how much of what we do can be done on autopilot. And just by using this pattern matching level of intelligence.


Yeah, I've noticed Ilya was pretty expressive during that answer. I'm curious if you have anything to add, Ilya.


Very little. The only thing which I think I may have had an area, maybe a nod, is I think it's very unclear where pattern matching ends and reasoning and deep intelligence begins. And I think often when thinking about where deep learning is going and how do we recognize if language models are becoming smarter and what it means, we do look to math and math competitions as kind of a standard for formalized reasoning that currently seems like a clear weak point for language models. So I'm very curious, let's say we had a language model like GPT-4 that was capable of carrying on the conversation, but also it could solve any IMO problem, including these more novel combinatorics problems consistently. What do you think that system could still be missing compared to humans in terms of ability to do research or progress technology forward?


Well, certainly I know that humans who are very good at IMO olympiads, some of them go on to be research mathematicians and many don't. It's a different skill. It's more of a sport, it's a controlled environment where the problems are sort of known and they're balancedly solvable. And the techniques are, there's always a standardized set of techniques. You know, you're not expected to invent really deep new mathematics to solve a problem. So this, I mean, some skills are, I mean, like I found, you know, I participated in olympiads as a child. And sometimes there are certain little steps in a long argument in my research that are having a little bit of an impact on the way I solve the problem.


You add fuel to them. They're sort of self-contained mini-game, if you wish, where there's a certain small number of rules. And there I find the Unloopy training to be useful. So it will definitely help.