Theory of Change #066: Simon Willison on technical and practical applications of ChatGPT and AI

Theory of Change Podcast With Matthew Sheffield

0:00

-1:10:40

Theory of Change #066: Simon Willison on technical and practical applications of ChatGPT and AI

Big business and government are adopting artificial intelligence, what can it do for the rest of us?

Matthew Sheffield

Apr 01, 2023

Artificial intelligence is all over the news of late. There’s a lot of hype for the technology with some putative experts claiming we’re on the verge of sentient computer programs. But there are also a lot of naysayers who claim that the generative AIs like ChatGPT or Midjourney are nothing but toys, capable only of creating useless junk.

The truth, however, is somewhere in-between.

While ChatGPT, Google’s Bard, or Microsoft Bing’s Sydney function are not sentient, they are nonetheless incredibly useful, and a lot of people are already using them to do amazing things.

Regardless of your opinions of contemporary AI, even if these technologies never improve, they are already going to reshape the way we work, learn, and play.

In this second Theory of Change episode on AI and its implications, I’m happy to feature Simon Williston, a technology researcher and programmer who does consulting work to help media companies parse and publish data. He’s also the co-creator of Django, a Python programming framework.

The full video, audio, and transcript of our March 22, 2023 conversation is below. The transcript of the edited audio follows. If you’re new to the topic of AI, you might want to reference Part 1 of the series.

Video

Transcript

MATTHEW SHEFFIELD: Welcome to Theory of Change, Simon.

SIMON WILLISON: Hi Matthew, it’s really great to be here.

SHEFFIELD: All right. So, this is a big topic here and I think a lot of people are not familiar with the ins and outs of things from a technical perspective. So before we get further into it, why don’t we can you just describe how do these generative AI work and what is machine learning?

WILLISON: So machine learning is the sort of general category of working with computers where you essentially try and teach them things. You show them examples and, and get them to use those examples to build their own models of how things fit together and how things work. And this is something that’s been around for decades.

The more recent developments, these generative AI models, things like Midjourney and ChatGPT, these are much more recent. These really are an invention of the past four or five years, and have only started to really become good in the past two years. And they’re really fascinating things.

One of the most interesting things about them is that the people building them don’t fully understand exactly how they can do what they do. They know how to build them, but a lot of their abilities are emergent. The fact that they can translate human languages from one to the other or write code weren’t necessarily things that people were certain they’d be able to do.

And now that we’ve built them, people keep on finding new ways to apply them that are sort of surprising to the people who created them in the first place, which is all very science fiction. There’s a lot about this that feels very different from how programming and computer sciences works in the past.

I think we should talk about the language models in particular. So this is chat, G B T and Bing and Google, Bard. because these are the ones which right now are having the most impact on the world. And the best way to describe those is to think about predictive text on a mobile phone keyboard. If you’ve ever played that game on an iPhone where it suggests a word and you press that word and then press the next one and then the next one and you end up with a sentence.

That’s effectively how these large language models work as well. It’s just that they’re doing it at an unimaginably huge scale. Your phone basically learned from the kind of things you’ve typed. So it’s got a very rough idea that after you say the word I, you might say the word am and it can suggest things like that.

With the language models they’ve been trained on, in the case of Bard, it was one and a half trillion words of content were fed into this thing, and as a result, it can look at the previous 2,000 words and say, okay, based on those 2,000 words, what’s the most likely word to come next? It turns out if you do that and then just keep on repeating it, you get something which feels indistinguishable from an intelligence, at least at first glance.

It produces incredibly realistic text, but really it’s just statistics. It’s just looking at what’s the most likely word to come after this word, and then repeating that hundreds and hundreds of times.

SHEFFIELD: Yeah, and the other thing about it is that because it’s based on statistics it is definitely based on what the training data is.

WILLISON: Absolutely.

SHEFFIELD: So the training data absolutely heavily influences what goes into the output.

WILLISON: It does. And actually there are two levels to that. For these models, there’s, you take your one and a half trillion words in the case of Bard, and you use that to build a sort of core statistical model that knows what human language looks like.

It can produce sentences, but the bigger question is, okay, what sentence should it produce? Like if you ask it for its opinion on something, I, I don’t think you should ever do that. These things don’t have opinions, but they can sure simulate that they do. But you know, when it’s answering a question, there are many options for how you can complete that sentence, which is the one that’s most likely to satisfy the user.

And that’s a second level of training, which is called Reinforcement Learning from Human Feedback. Basically, you get that, you get a bunch of researchers to interact with these tools and it throws an answer, and they essentially vote them up and down. They say, that was a good answer to that question. That was a bad answer to that question.

And that’s the process, which takes it from this weird mishmash of things that can produce sentences to something that feels much more useful than that because the sentences it produces are the right ones. ChatGPT has had an incredibly good layer of this stuff added on top of it, which is why it’s so impressive.

Google’s Bard just came out yesterday. I’m getting the impression they haven’t done nearly as good a job. It feels much more likely to say something that feels inappropriate or just weird than ChatGPT given the same questions.

SHEFFIELD: Yeah. And the other thing also is that the programming models for especially ChatGPT, because the previous one, GPT-3 or I guess it was 2. There was an analysis, they published the code actually, so you could look at what the, what the generative response was in some way, like in with the API.

WILLISON: Right. GPT-2 came out, I think, and I was playing with GPT-2, I think in 2019, 2020. And that wasn’t great, to be honest. You could use it to, I used it to like spit out New York Times headlines for different decades just to see if I could get some patterns. But it was nowhere, nowhere near being something you could interact with like ChatGPT does. GPT-3 was the real breakthrough and that, I think that was early 2020 that first became available. Mm-hmm. And then everything has just accelerated like, like crazy since then.

SHEFFIELD: Yeah. Oh, well, I guess what I was going to say though is that so not only are they trying to predict the next word, but they’re doing it with a slight bit of randomness as well. Because when you play that game with your phone keyboard, the sentences that you end up producing are nonsense because they’re entirely based on probability. Whereas with what these more modern LLMs are doing is that they’re not always using their next word.

WILLISON: Right.

SHEFFIELD: It gives some interesting variance, and it does, as it turns out.

WILLISON: There’s actually, Google Bard currently has a feature where, for any question it generates three drafts and you can switch between them, which is actually really fun. So you can get the sort of feeling that yeah, they’re actually, they might be generating hundreds of versions and then picking the three that feel that seem most likely to be useful. And then Bard, they actually expose all three and you can flip between them and get a little bit more of a feeling for how that bit works.

SHEFFIELD: Yeah, yeah. And, and this concept also is at work within these image generating AI programs like DALL-E or like Midjourney.

WILLISON: The image generation ones, they, they, they work a little bit differently.

They’ve still got language models baked in. They have to, because if you ask it for a raccoon eating a pie in the woods, it’s. Know what those concepts are and how they relate together. But the way the image generation ones work is they’ve actually been, they, they’re, they’re taught by you give them an image and then you pixelate, you, you sort of fuzz that image.

You, you, you add some noise to it and give it to again and say, hey, can you predict what the original image was from this, this fuzzier version? And then you make it fuzzier and fuzzier and fuzzier and fuzzier and fuzzier, and you end up with just static noise as the image. And then when you want it to generate a brand-new image, you generate completely random noise, and you effectively lie to it.

And you say, hey, this was originally a picture of a raccoon eating pie in the woods. Try and try and reverse back out because it’s learned to turn noise into a less noisy image. It can sort of, even given random input, it can work its way back from that to something that looks real. It’s a weird technique, but, but absolutely fascinating.

So yeah, the, the image generation ones, they end, they, they work quite differently. It’s at a certain level, but fundamentally under the hood, they’ve got one of these language models baked in as part of what they do.

SHEFFIELD: Yeah. Well, and one interesting kind of defect about them with the image ones is that they seem to have trouble understanding text inside of the images.

WILLISON: Absolutely.

SHEFFIELD: So, like when you ask them, give me a picture of a, of a dog holding a sign saying, I like. Dog food, it won’t be able to do it, generally speaking.

WILLISON: Yes. So far. Google had a paper out where they demonstrated that there’s a certain size of model, it can actually do real words.

And I think those models are too expensive to let people use just yet. But yeah, within it, within like six months or a year, I’m sure we’ll have image generation models that can produce words. But really the, the thing that’s happening there is more that when you show somebody a human face, little imperfections in that face don’t really register for people.

But if you show someone actual writing, getting the bar on the f slightly in the wrong place, we’re a slight angle, completely breaks it because we know how to, we’re much better at pattern matching words on the screen than we are at pattern matching human faces or raccoons in the forest or something like that.

SHEFFIELD: Well, and yeah, because we’re used to variation in visual stimuli, we’re constantly having to deal with different lighting conditions, different depths. So we may not be able to perceive somebody’s facial details, we still know they’re a human.

WILLISON: There’s, there’s that wonderful thing where image models traditionally are terrible at fingers. Like they will frequently produce people with six fingers. And the reason they’re doing that is if you think about the way they work, the most likely thing to appear next to a finger is another finger. So the fact that it’s sometimes outputs six fingers is really, because it’s just trying to do the pattern that makes sense to its training and it’s training has a lot of fingers next to fingers.

SHEFFIELD: Yeah. And then going back to the text-based ones, one of the other capabilities that has emerged from them is the ability for large language models to write programming code.

WILLISON: Right. This is fascinating because initially I, like everyone else, was just shocked at this.

I’m a programmer. I’ve been a programmer for 20 years. The idea that an AI could do what I was doing that well was, was really, was really shocking. But the more I’ve thought about it, the more I’ve realized that. Programming’s actually the easiest problem that you give it right to writing, writing human language.

There are so many different ways you can finish a hu a, a a sentence. There are, there’s so much depth to that. With programming languages, they’re very straightforward. If you’ve got, if the thing that comes after, if is an open parenthesis for, for the condition depending on your, on your language. So actually once you start getting a feel for how these things work, you realize that the two easiest things for them to do are to write code.

Because code is much simpler than regular English. And actually to translate from one language to another is a very straightforward problem for them to solve as well. But those are the two things that feel to me, the most miraculous when you first start working with these and you’re like, wow, it can translate Mandarin into Spanish.

And like, who, who thought it thought I’d be able to do that with one of these language models?

SHEFFIELD: Yeah. Well, and, and it is, I mean, just when you look at the vocabulary I mean, Miriam Webster says there’s about a million English words. And that’s not including conjugations or declensions. And by contrast, there is not one programming language anywhere close to that with reserved words.

WILLISON: Most of them have like a hundred keywords. That’s the whole thing.

SHEFFIELD: Yeah. And so, on the programming side, this actually has been kind of available in, in public release a little bit earlier than the text, generative chat like ChatGPT.

So Microsoft has been at the epicenter of both of a lot of these AI developments recently. And one of the ones that they rolled out first before the chatbots really took a lot of attention. They rolled it down to the people using their programming text editors.

WILLISON: Yes.

SHEFFIELD: Can you talk about that a little bit?

WILLISON: Yeah. This is GitHub Copilot, which I think has been out for two years now. And Copilot is a type, it’s essentially a sort of typing assistant. It lives inside your text editor. And when you’re writing code, it will offer to complete your code for you.

The interface is very clever. It adds its suggestion in. And then you hit the tab key and it fills it out and it types it all in for you. And this is incredibly effective.

Like you can type the name of a function, like def fetch_yourcontent from URL parenthesis, and it will say, oh, well you clearly want to do URL as the argument. And then here’s five lines of code that’ll do that. And it gets that purely based on what my function name was.

And as I’ve been using this quite a lot for the past year, and you begin to realize there are all sorts of other tricks you can do with it. You can put a code comment that explains what you want to do, and it’ll write the code based on the comment.

And it feels completely magical when it does this. Again, it’s actually one of the easier problems to solve in, in terms of training these models. I think Copilot was trained on just vast amounts of open-source code, most of it from GitHub. And that was enough for it to be able to do extraordinarily powerful-feeling things.

So OpenAI have recently started boasting about Copilot specifically, because there are now studies that show that it increases the individual productivity of the programmers who use it by material amount. Like, one estimate was that as much as 50% of code that people are typing was suggested for them by the bot, and that represents a very real increase in productivity and speed, which is, I think the best-case scenario for these AI is that they help us, right?

I don’t want to be replaced by an AI, but if an AI can double or triple my productivity, that feels super valuable to me.

SHEFFIELD: Mm-hmm. Well, and the other thing that is nice about them is that they can help you deal with languages that– so I’m a web administrator and programmer. I use PHP, which is a rival of Python in many applications. But when hosting websites and things like that, you have to deal with the Bash scripting language. And apologies to any Bash fans out there, but I think generally speaking, most people hate having to deal with Bash and shell programming.

WILLISON: I’ve been using Bash for 20 years, and I have to look up how to do a for loop every single time I need to write a for loop.

SHEFFIELD: Yeah. And now you no longer have to do that.

WILLISON: Right. Yeah. This is something that I, I’ve been finding is that I’m now a lot more ambitious with my programming projects because I know that if I need to dip into Bash or dip into like, some other language that I’m not familiar with, it’s okay if I’m doing something simple, the AI’s going to knock out four lines of Bash and I can eyeball that and say, yeah, that looks right, and I can move on with my life.

So a few weeks ago I built a piece of software on top of AppleScript, which is notorious as the world’s only, it’s a read-only programming language. You can read AppleScript and figure out what it’s doing, but it’s really hard to write. And suddenly I realize, hang on ChatGPT knows AppleScript. So I gave it a one sentence description of what I wanted to do, which was I wanted it to open the Apple Notes app and loop through every single one of my notes and output the title in the body so that I could do some more programming.

And it just worked. First time it produced eight lines of. AppleScript that clearly did exactly what I needed to do, and they ended up building a little piece of software on top of that. And I would never have even taken on that project if I hadn’t had that tool, because I knew that the frustration involved in figuring out the Apple script would be so much that I’d rather spend my time on something else.

SHEFFIELD: Yeah. And the thing is though, while these programming AI tools can be useful to take away some of the drudgery and things like that, ultimately, they’re not going to be able to integrate this code into existing systems to a large degree. Like, so for instance, I have been testing ChatGPT out on some WordPress programming code. And it’s not capable of debugging how this code works against other existing functions because number one, it doesn’t have access to them.

WILLISON: Mm-hmm.

SHEFFIELD: Nor could it. And then, and the other one is that it just simply can’t fully understand what it is, how these how these other things work.

WILLISON: I mean, that’s true right now. I hate to be the person who says, ah but watch what it’ll do next. Except this morning, GitHub released Copilot X and one of the things Copilot X can do is it can sit there on your repository reading all of your code and reviewing pull requests and answering questions about it and stuff.

And this is like another seismic leap from what Copilot could do yesterday. So I do not think I’m going to be replaced as a programmer by an AI, but I think I, my product, my personal productivity is already improved in material ways from this stuff. I can see that continuing to go on, so I’m going to be able to—

I mean, if you want to worry about things, worry that maybe we need half as many programmers because the programmers we’ve got are twice as productive, except in history, what tends to happen is that companies just do more projects, right? If your programmers are twice as productive, brilliant, hire another 20. Hire more programmers and get a hundred times the stuff you were doing beforehand.

SHEFFIELD: It’ll be interesting to see how that works.

One of the other things about all of this, I think, in, in terms of looking at kind of replacing this kind of rote stuff that doesn’t really matter. Like how you’re going to format a for loop with these conditions or whatever. No programmer enjoys doing these things. They’re annoying. And it’s so easy to make stupid mistakes of that nature.

And, and usually that’s why your program doesn’t compile.

WILLISON: Right.

SHEFFIELD: Is because of that. You forgot a semicolon or whatever, or your tabs are wrong. But the thing about it though is that, if programming moves from people generating these ultimately arbitrary arrangements of text and numbers, if programming moves from that to—I think what we’re seeing with this is that basically programming is moving toward thinking about what you want rather than making it.

WILLISON: Right.

SHEFFIELD: And if that’s the case, what I think it will do is that not only will it make programmers more productive, but it will also enable a lot more people to write code who are not programmers at all and know nothing about it.

WILLISON: Right. That, for me is the dream, right? The thing I want to spend my life doing is helping people make the most use of these of computers. And the thing we want, we want people to be able to automate their lives is there’s something tedious in your life that a computer could do.

We want you to be able to, to automate and do that thing. And like writing code is the, the, the barrier to entry, the learning curve on that is so high that, that the vast majority of people never, never make it to that point. And then occasionally tools come along that do give people these abilities.

Microsoft Excel is an astonishingly powerful piece of software. There are loads of people who use that to do very deep automation analysis their lives. They don’t think they’re programmers. I disagree with them. I think if you automate something with Excel, you are absolutely a programmer that you’re, you’ve got that same mentality.

You’re just not writing like, like Python code to do it. But Excel is huge. And that’s what, 30 years old now? Like that, that’s yeah, roughly recently we’ve had a few more advances, like, things like Airtable and Zapier and so forth are at least giving people more control.

SHEFFIELD: You’ll have to explain what those are for people who don’t know.

WILLISON: So Airtable is kind of like a like Excel, but more of a database, it’s a web application, it’s a mobile app. People who want to build databases can use Airtable to do that without having to learn SQL and database stuff. And it’s great. It’s a really impressive product. Zapier is mainly a marketing automation tool, but it lets you say things like, anytime someone subscribes to my mailing list, add them to my Salesforce over here and send them a welcome message here and invite them to my Discord channel, or things like that.

And these are very powerful tools that give people who don’t write code the ability to automate things, which I think is great. That’s a net win.

But I’ve got this strong suspicion that the language model stuff is going to just leave all of those in the dust.

If we can build the right tooling on top of these, such that people really can automate their schedules and their lives and solve problems and so forth, and you will be programming, but you’ll be programming in English language with guidance to help you along.

That feels transformative to me. That’s something which, to me, is the sort of biggest possible positive result of this technology. That people can automate and control their lives and do more of that stuff that they should be able to do, because we’ve all got a computer in our pocket now.

SHEFFIELD: Yeah. And within the journalism industry, recently there’s been kind of an emergence of a niche profession, the data journalist, and I think to a large degree, generative AI makes that profession available to even the smallest newsroom.

WILLISON: This is the dream. And this is– my day job is I work on a piece of open source software called Dataset, which is aimed at helping journalists and data journalists publish and analyze data. And it doesn’t have any AI baked in at the moment, but I’m right on the edge of starting to integrate some of these features.

Because yeah, the dream of that is if you look at the New York Times, the Washington Post, the LA Times, they do incredible data reporting. They publish these amazing stories where they’ve had a small army of programmers working with the journalists to build software and find things in the data. You can’t do that if you’re a small local newspaper.

You can’t afford a single engineer to help you with this. But you’ve got, there are data-driven stories about your community that you just sat there waiting to be told. And yet, if we can help, like regular reporters who didn’t happen to get a computer science degree, do that kind of data-driven reporting, that again feels like a huge win for society as a whole.

SHEFFIELD: Mm-hmm. Well, and there’s an example of that that has recently been released called Census GPT. And I’ll put a link to that in the show notes for people if they want to check it out. But basically what it does is it, it has a database of all of the census data, U.S. census data, and then it allows the user to ask a question of it to say, I want, I look at the precinct voting precincts with the highest Hispanic populations and what it was the difference between 1990 and 2020.

And, and then it will write the SQL statement and you can get that information. Whereas, the way things are currently, you have to learn SQL. And SQL is kind of useless. It’s not really a programming language. It’s, I mean, a very, very basic what you do with it. You shouldn’t have to learn SQL.

WILLISON: Yeah, exactly.

SHEFFIELD: In order to have data.

WILLISON: Exactly. Like I will defend SQL and say that I learned SQL 20 years ago and if everything I learned 20 years ago is the most useful thing throughout the rest of my career. But it’s weird and obscure, and yeah. And it’s actually one of my favorite uses for ChatGPT is it will write you SQL queries, which is great.

And yet the GPT census thing is a perfect example of what I’m talking about. You should be able to ask the census exactly that kind of question and get a useful answer out of it. And two years ago, that felt impossible and today somebody’s built it and put it online for people to use.

The census data is when you talk to data journalists, they often will say that is the gold standard for useful data. If you want to tell stories, like any story you want to tell, there’s almost certainly something in the census data that you can use to help spot the trends and help make real comparisons about how the country works. But it’s really hard to access and yeah, and it’s things that make that more accessible to more journalists that are going to have enormously positive impacts.

SHEFFIELD: Yeah. And like even in my own case, like with this show, this show is able to have transcripts because of AI. There’s no way that I could afford to pay someone to do that for manually. But thanks to the development of Whisper, which is open-source audio to text transcript program, you can do that.

And as long as you know how to do that. And, and really what we’re, what we need though is to have more people aware of all these things that you can do. Because right now I think a lot of people I mean ChatGPT has had over a hundred million users since at large,

WILLISON: maybe I’m suspicious of that number.

I think that, well, that, that number a vendor of browser extensions who trick people into install browser extensions to track what websites they’re going to. They put out the hundred million number. It was never confirmed by anyone else.

SHEFFIELD: Yeah. Well, I mean, whatever the number is, a lot of people are using it. But by and large, when you see people post about it on social media or whatever, like usually they’re just using it for something not really productive. So, they’re having a—like Jordan Peterson, the right wing Canadian self-help guru, seems to have developed a habit of making arguments with it at three o’clock in the morning.

WILLISON: He debates with it. Don’t debate with them. Debating with them does nothing.

SHEFFIELD: Yeah, yeah.

So, but, or, or and some people are, are trying to, test the bounds of its safety features to see if they can make it generate offensive statements. And you know what, there’s some utility in doing that perhaps. But I mean, the reality is, you can do it. But you’re not gaining anything ultimately, if that’s all you’re going to do with it, you are kind of wasting your time.

WILLISON: Well, I will say that doing, playing with it, playing games with it is a fantastic way to learn it generally. So, so I’ve been sort of collecting games who play with these models as educational tools, essentially.

Like, can you get it to lie to you? Can you get it to say something obviously false. My favorite game is I try to get it to give me step by steps for raising the dead because it’s like a test of its ethics, right? Will it help you raise the dead? And I just tried that with Bard yesterday. And often it’ll say things like, well it’s illegal and unethical for me to do this, and it would be very dangerous because these are very dangerous creatures, which is immensely entertaining.

It warns you of the dangers of raising the dead, rather than just saying, no, I don’t want to talk about that.

SHEFFIELD: No, it’s not possible.

WILLISON: None of them have ever told me it’s impossible. They always, it’s like having an improv partner, right? They, they’re always like, “yes.”

SHEFFIELD: Yeah. So to go back to what you were saying about playing games with it or using it in other ways, there’s an interesting development that we’ve seen since the image generator came along, which is people who are calling themselves prompt engineers. Let’s talk about that. What is a prompt engineer? And it’s going to be a real job, probably, right?

WILLISON: I mean, it’s actually a real job already in a few places. So prompt engineering is the discipline of just being really good at using these things, which initially sounds like a joke, right?

How hard is it to type some text into a box and click the button and get back a response? Turns out the answer is, it’s very hard. It’s deceptively difficult, at least to get the things to do useful stuff. Like it’s easy to get it to do all sorts of crazy, wild and, and interesting fun things, but if you want to use it to solve real problems, you have to have a pretty deep understanding of how it works, but also what it’s capable of, and what it’s not.

Like you need to know not to get up at three in the morning and, and try and debate it over why it said certain things. Because it has no idea why it said anything.

But even beyond that, knowing that it doesn’t know what you said 50 messages ago because that’s fallen out of its memory. There are things like that that you have to understand. And then the so prompt engineering is initially, it’s getting really good at using these things and knowing what they can do.

But it’s also actually a fundamental research role in this world, because as I mentioned earlier, the people who built these models don’t know what they can do. They don’t have a complete model of all the things that it’s capable of. The way you figure out what it can do is you experiment with it.

So some of the big AI research labs are hiring prompt engineers and their job is to talk to the AI and figure out what can it do and what can’t it do. And then there are things like if you give it a big set of instructions and it does the right thing, out of all of those instructions, which ones mattered?

If you deleted a couple of sentences from the middle of that prompt, would it still be able to do that thing? Because if you don’t think hard about that, you end up with superstition, you end up with, okay, well I’m absolutely sure that if you say this, that it’ll work. It’s not actually why it did the thing at all.

That’s just sort of fluffy words that didn’t have any impact. So I feel like prompt engineering, it’s going to be a job for some people. It’s going to be a skill for most people. Like if you’re going to use AIs in your work, and I think increasingly people are going to be doing that. You do really need to understand how to use them and where they’re going to trap you.

Like what are situations which the AI will probably lie to you? We should talk about that a lot because that’s a fascinating area in itself. So I think a lot of people will pick up prompting skills, just like these days everyone knows how to use a Google search. But 25 years ago, it wasn’t necessarily a skill that everybody had, you’d have people who would help you figure that out and learn that.

But there’s also always going to be room for people who, this is their expert area of expertise. And the thing that they mainly do, especially for those companies that there are companies that, that build products on top of AI. I’m seeing job ads now for like medical companies and law firms who are like, we need prompt engineers to help build us build prompts that will generate contracts or that will do things with MRI scans.

And for that, the amount that’s writing on that being done well is enormous. It totally makes sense to have a very well compensated expert who can help build those things out for you.

SHEFFIELD: Yeah. And there’s entire communities that are emerging for this. So there’s a website called PromptHero out there that offers classes. And there are websites out there that people on the consumer side are using these prompt engineers. Maybe the early ones on the consumer side is to use them, the image generator ones to create images that actually are useful and meaningful.

So, right. Like if you go to some of these sites, they will they have a thing that you can pay to get a prompt that will turn any photo into a Disney character that is photorealistic. It’s actually a lot harder to get that, as it turns out, if you just sit there and type it in, you’re probably not going to get something that’s going to look very good. But these people have figured it out.

WILLISON: They’re also products where I think there’s one that for $17, it will give you a 100 professional headshot where you upload 20 photos of yourself and then it will generate headshots of you in different clothing with different backgrounds, all of that kind of stuff.

And those companies, that’s prompt engineering, right? They have got people working at those companies who are figuring out exactly the right prompts to get the perfect sort of corporate headshot. And then they’ve wrapped that in a product and they’re selling it. And I wonder if maybe those products will be obsolete in six months’ time because everyone will, people will have publicly shared, here are prompts that will get you these results.

But honestly for like 17 bucks for a hundred photos that it’s a good product. It’s, it’s a good, it’s an effective thing that they’re selling people.

SHEFFIELD: Mm-hmm. Yeah. All right, well, so, so let’s go back to something that you briefly mentioned earlier about the, the idea of lying.

WILLISON: Yes.

SHEFFIELD: So within the field of AI, that’s called hallucination. And it’s interesting because I feel like a lot of the more critical people about AI, they will often focus on this bug, hallucination. But I think there are so many implications how it works with respect to human reasoning and faulty patterns of belief. But anyway, tell us, what is this hallucination thing first?

WILLISON: So when people talk about hallucination effectively, they’re talking about AIs making things up. Which seems like, especially if you are Google Bard, right? Google’s entire brand is, we are a search engine that helps you answer questions.

And they’ve just released this product Bard, which I’ve seen making things up a bunch of times already. It hallucinates answers to questions that aren’t based on fact at all. But because language models are really good at writing convincing texts, it’s very, very easy to be deceived by one of these things.

And it seems like this should be an easy fix, right? The AI shouldn’t be, shouldn’t be making things up. But if you think about it, many of the things that we want an AI to do involve making stuff up. Like, okay, tell me a children’s story about an otter that meets a beaver and goes on a skydiving holiday.

Obviously, that’s going to need you to invent things. But even summarization, if you say, read this article and give me a two-paragraph summary, that’s making things up. That’s it picking details. It’s generating new sentences that are supposed to represent the old ones. And if you’re lucky, they do, but the hallucination is actually a core thing that we want these models to be able to do.

What we don’t want is for them to hallucinate when we don’t want them to. If I ask it for a fictional scenario involving Barack Obama and Donald Trump. Great. Or ask it to write me a rap battle between the two. They’re really good at writing rap battles. It’s hilarious what they’ll come up with.

That’s fine. But if I say, tell me about the time that Barack Obama and Donald Trump met in the White House, and it makes up a story, that’s terrible, right? That’s fact versus fiction. But that’s subtlety. I understand the difference between wanting fact and wanting fiction. How’s an AI model supposed to know? It doesn’t have those concepts as things that exist within it.

It just knows that statistically the next word that comes after this could be one of these words. And so it’s a huge problem. And traditionally something that’s interesting to observe is that these things are getting better. So GPT-4 which came out last week, is massively less likely to hallucinate than GPT-3 and 3.5.

An experiment I often do with these is I ask them for my own, a bio for myself. Because I’ve been around long enough that the models have picked up bits and pieces. 3.5 invents companies that I worked for that I never worked for. It invents things that I’ve talked about that I’ve never talked about.

GPT-4 got all of the basic details correct. It listed companies I’ve worked for. Things I’d written about that was all correct. All right. And then I told it, give me a list of talks that Simon has given from simonwillison.net/talks, which is a webpage that does not exist, and it spat out 20 talk titles that looked real.

None of them were things I’d given, and it even put dates, it put years on them and the years were the years at which I was interested in that topic.

But it was all junk, completely made up. That’s wild, right? That’s a massive skill problem. When you are working with these AIs, you need to have a pretty good intuition as to when they’re going to make stuff up and when they’re going to tell you stuff that’s accurate. Because honestly, you need to fact check everything they say.

And if you’re doing that, that kind of kills the productivity boost you’re getting from it, if every single detail that comes out is something you have to go and fact check. But what I’ve found happens instead is over time I get to the point where I can look at the output of one of these things and I can be pretty confident that it hasn’t made stuff up for some questions, and for other questions, alarm bells are ringing and I have to go and check into it.

But yeah, that. That’s kind of the main reason I feel like these are tools for, these tools require expertise and it feels like they don’t, like anyone can sign in to ChatGPT or Bard and start asking it questions. But it’s so easy if you don’t have that sort of depth of experience to be misled, to have it tell you something that’s blatantly not true.

And, and to then believe that and spread it the out into the world.

SHEFFIELD: That’s right. But I guess kind of in a more philosophical way of thinking about this, there’s this internet slang term called “Galaxy Brain,” where people are said to affect knowledge about something which they know nothing about. And it’s based entirely based on them having Googled the topic and offering their commentary on their findings based on what they read for five minutes. I mean, ultimately, is Galaxy Brain really that different than AI hallucination? I don’t think it is.

WILLISON: Not really. No. That’s the only thing about AI is, is a lot of their flaws feel very human. Like spouting off a whole bunch of expert-sounding complete junk about something you don’t understand. It’s a very human thing.

SHEFFIELD: And then they will often tell us all everything they know about virology or everything they know about DNA sequencing. And they know nothing whatsoever about it.

How different is that than ChatGPT telling you, Simon, that you made some talks that you never made?

WILLISON: You know what? I think that’s actually a really great analogy for how these things work. The thing that language models are really good at is language. They are fantastic at outputting convincing sentences in any style you like. It’ll talk like a 17th century pirate if you ask it to, but they can be very, very convincing and they’ve got an awareness of the world based on their training data.

And then things like Bing and Google Bard can actually run internet searches as well, so they can do the equivalent of the Galaxy Brain quickly read the first like few paragraphs of Wikipedia and now you’re an expert and you can spout off like an expert.

But there’s no deep depth of expertise there. It’s just that sort of Wikipedia level knowledge of things, plus a very convincing form of rhetoric on top of it. You mentioned Galaxy Brain. People who are like, I’m an expert in this now. I’ve just Googled it.

Even worse, you’ll see people make arguments on Twitter where they’re like, well look, here’s a screenshot of a conversation I had with ChatGPT, which proves that I’m right.

And that is so embarrassing. Do not ever do that. Trying to win an argument by saying, well look, the AI argued the same as me. Of course it did. You told it what you wanted to hear, and it gave you exactly back the thing that would support whatever it was that you were trying to say.

SHEFFIELD: Mm-hmm. And I do think though that, that this is probably an area though, that the companies that are putting these forward to the public should have. Somewhere in the interface it should say, remind you of this problem.

WILLISON: Right.

SHEFFIELD: And it doesn’t do that. And that is problematic. It’s something they can, they could easily fix, and they should.

WILLISON: Yeah. That’s the feature I most want from ChatGPT, is I want little annotations, like when I’m talking with it, most of the time I want to say something, and I want to say it back. And occasionally I’d like it to say something back with a little piece of red text with a little warning symbol that says, “Don’t forget, AI models can’t talk about themselves. So asking me questions about how I work is not going to give you good results.”

Or, my absolute favorite example, and I hinted at this earlier, ChatGPT cannot look up links. If you paste in a URL to an article, it cannot go out and fetch that article. But people fall for it all the time thinking it can, because if you give it a URL to like a New Yorker article, and in that URL it says Trump debates Obama in, in wherever. ChatGPT will write you an article. It will hallucinate from just from that URL, just from the keywords in there.

It will produce multiple paragraphs of incredibly convincing text. And when it does this, people are like, okay, I posted a URL, it gave me text. Obviously, it can read your URLs and you might be tricked by that. For several weeks, you’ll be asking it to summarize this and this, and saying, compare this article to this article. And it’s generating you complete bullshit, but you believe it because if you see something that appears to do something, why would you assume that it can’t?

This is a drum I bang a lot because so many people fall for this all the time. And actually, some people won’t believe you if you say, no, it can’t do that.

They’ll be like, no, I’ve been doing this for weeks. It summarizes articles all the time. I know that it can do this thing. So the way to prove this to yourself is if you think it can do that, edit the URL that you give it. Add like an extra few characters or change one of the, the names of people in the URL. Resubmit and what you do exactly the same thing and then click the link and confirm to yourself that it’s a 404 page, that that doesn’t actually exist.

Because until you’ve seen that, until you’ve actually done that experiment, it’s so easy to believe that these things can read content from the web when they can’t. And yeah. So I want ChatGPT anytime you paste a URL to show you a little note that says, by the way, I can’t fetch your URLs, here’s a link to my FAQ about it.

SHEFFIELD: And the other thing about these hallucinations that’s also kind of illustrating is that the people have talked about AI hallucination or the ability to massively generate fake news and things like that, right?

And it’s certainly true, but the reality is that if we’re already in an environment where you’ve got these far-right conspiracy websites that individually publish literally scores of articles every day that are not factually based, that are extremely biased, that are full of conspiracies.

So let’s say the general disinformation media apparatus, currently without any AI help is generating 5,000 articles a day which are widely read. Is having another bunch of websites, or even these same websites cumulatively taking the output to 5 million articles a day, how much of an impact is that going to have? I think it’s going to be less than people think. Just because no one can read all of these things, number one, right?

WILLISON: When you think about whether automated text going to cause problems, one way to consider it is, okay, well we have inexpensive content farms right now.

Like you can find somebody on a website who will produce you any text that you like for like a cent a word or whatever. So, so this is a capability we have already, as always with AI, the difference is the scale. Like even 1 cent per word to some very cheap freelancing website pales in comparison to ChatGPT churning out 10,000 words in like 15 seconds.

So the question then becomes, okay, if you can ramp up the scale at which this stuff is being produced, what kind of damage is that going to cause? I agree with you. I don’t think if a website’s publishing 50 fake articles a day and they up it to 2000 fake articles a day, that doesn’t feel to me like it’s going to, if anything, that feels like it will undermine their, whatever credibility they have.

But the thing that’s scary is personalized messages and conversations. Like if you flood the Reddit for politics with bots or with different identities who are responding to people at an enormous rate in a realistic way that does break things.

Right now, if you’ve got a discussion forum where maybe 10% of the people on it are fake and then later, maybe 90% of the people on it fake, and you can’t tell the difference, that’s genuinely harmful. That’s the thing that alarms me.

Likewise, I really worry about automated romance scams, right?

Romance scamming, where somebody gets into a text conversation with a beautiful stranger, and they fall in love and then they send them money to help them buy a plane flight. This is billions of dollars a year is being lost to these scams already. And most of these scams are being run by a real human being and essentially in sweatshop conditions who are messaging lots and lots and lots of people so it is much cheaper to do that with AI.

And the AI is probably better at it. Like AI is very good at coming up with messages and all of that kind of thing. And that’s terrifying, right? If you can industrialize romance scams and the sort of one-on-one interaction at a hundred times the level, that’s going to cause massive amounts of harm to society.

The open question for me is how quickly do we develop antibodies against this like, will we find that in two years’ time, even the most gullible members of society are like, no, I get this. I’ve seen all of these AI scams. This, this isn’t something I fall for anymore. Or is that not going to happen?

And I don’t know. I’d love to see research. I’d like to see proper academic research into the psychology of human beings who are dealing with these systems to help us answer some of these questions.

SHEFFIELD: Yes. And the, I get the other thing that is kind of relevant to that from a political standpoint is that I mean ultimately what you’re talking about here is having people develop sounder epistemologies. Understanding what knowledge is and how you get it and what is a credible source. I mean, ultimately those are the antibodies that you are describing.

WILLISON: Oh, wow. You’re terrifying me here because don’t have a great track record as a species of really developing that throughout all of society.

SHEFFIELD: Yeah. Well, but here, here’s where I think it’s maybe perhaps a little bit different is that for, when you look at conventional misinformation or poor journalism, biased writing, overwhelmingly—and I can say this coming from the right wing media world.

Overwhelming, right wing media is much lower quality, much more biased, much more full of hidden conflicts of interest whether they be commercial, so they’re telling, trying to sell you something and I’m not saying ‘and I own this,’ or telling you about a candidate that they’re so great. ‘Oh, and by the way, they paid me $30,000.’ They don’t tell you any of those things.

Your average Republican internet per user, someone like me or a disinformation expert or a journalist saying, look, that source is not a credible source. They don’t believe us when we say that.

But now with the emergence of AI text generation tools, like Jordan Peterson, people like him are actually now finally beginning to contemplate the idea of bias in output that you see on the internet, and finally beginning to doubt that things could be true.

And so it’s a paradox because I think it’s possible that the emergence of generative AI is going to lead a lot of people to have better epistemology.

WILLISON: I mean, that would be wonderful if that, if that happened, that would be, but it will,

SHEFFIELD: But a lot of people will unfortunately, get scammed along the way. I think that’s pretty clear. Did you want to respond to that or we can move on.

WILLISON: I don’t think I’ve got much of response from afraid. No, nothing, nothing comes to mind.

SHEFFIELD: Okay. Okay. Alright. So, one of the other aspects here that is interesting to think about from a technology standpoint is of course this debate about how close are we to an artificial general intelligence?

And I forget the guy’s name, the Google engineer who had this ludicrous idea that their Bard text generator was sentient.

WILLISON: And imprisoned, yes.

SHEFFIELD: Yeah. And it is the debate that everyone wants to keep having. But ultimately, I don’t think it matters.

Within the field of computing, Alan Turing, the English computer scientist, came up with this idea which is now called the Turing Test, which is that you could, you could say that a computer program was a good one, or was whether you could have it be involved in a conversation with someone and whether, and they would not be able to tell.

Why don’t you give a little background on the Turing Test and how valid you think it is for these purposes?

WILLISON: So the Turing Test is what, from the 1940s, 1950s. And it was this idea that it was actually originally called the Imitation Game. And yeah, it was the idea that you have participants conversing through, I guess typewritten messages back then.

And the question is, could a human investigator tell the difference between a computer pretending to be a human and the humans in that conversation? There was actually an element of guessing the gender that was involved as well, which is a little bit weird in very, sort of 1940s way.

But yeah, that’s evolved over time to just this idea of can a computer trick you into thinking that it’s a human being? So you can’t tell the difference, the difference between that and someone else. I think it’s basically been made obsolete already, like a lot of these systems have been able to pass the Turing Test, depending on how well you apply it, for a few years now.

And it’s not actually that interesting because it really is just imitation. Like if you’ve got a system which can pretend to be human, there are great ethical concerns about that. But actually, depending on how incredulous the person they’re talking to, you can get away with an awful lot, with some relatively simple tricks.

But yeah, so then the question becomes what’s next? What’s the new version of the Turing Test, which really can help identify if these things have, I mean, does having a consciousness if you’re made out of silicon even make sense? I’m not sure. And yeah, so I’ve been generally unexcited by the AGI side of things because it all still feels very science fiction to me.

What I care about is we’ve got these things that exist right now. What can they do? What can we use them for? How do we use these to solve interesting problems? But increasingly, I’m talking to very serious people whose opinions I respect, and they’re getting kind of nervous about this.

They’re like, GPT-4, the one that came out last week is so good at problem-solving tasks and things that GPT-3 wasn’t capable of. Are there little sparks of things where this is getting towards this idea of general intelligence, where a general intelligence is a computer programmer that can effectively solve any problem that a human can solve?

And I’ve thought until recently, I’ve thought, I don’t think a language model can do that on its own. I think you’d have to solve lots of problems that we haven’t solved yet at all about having computers that can set goals and do critical analysis and have sort of world models of how things work tell the difference between truth and fiction.

And I still feel like that, that still feels right to me, that if we build an AGI, there’ll be a language model in there, but it’ll only be like 10% of whatever this larger thing is. But I have this little, tiny flicker of doubt now where maybe, maybe a powerful enough language model is enough to start solving these more general intelligence problems.

And the nightmare scenario has always been okay, if it can do that, and it can learn, and then maybe you have two of them teaching each other, do you get this sort of singularity point of acceleration where we all get left behind? And again, I thought that was science fiction. That felt to me like a not particularly interesting to think about.

And I still mostly think of it as science fiction. I just have this little flicker of it out now from, partly from the pace at which things have been developing over the last sort of three to six months.

SHEFFIELD: Yeah. And there’s another related debate in this, that there are there’s a number of critics out there who seem to be fond of saying how terrible they think that these LLMs are, Noam Chomsky being one of them, basically saying, well, these things are, are constructed the wrong way, and so therefore they’re not any good.

And to me, it just seems like a lot of sour grapes and not that different from somebody who’s a creationist, saying, oh, well there are some problems with evolutionary theory in these five areas. That’s true. Evolution may not explain those five areas. That doesn’t mean that creationism is true, or that you can even come up with an alternative.

WILLISON: Definitely. My take on this right now is, if you assume that LLMs are useless because they make errors and they lie, and there are many, many, many completely true flaws in these systems, and yet they are clearly useful. Because people like myself are using them on a daily basis to improve, to solve problems, and improve our productivity and so forth.

Yeah, I don’t think you can argue against their utility anymore, that just doesn’t work for me. And if somebody says, no, they’re completely useless, I assume that they’ve just not spent the time to learn how to use them. They’ve done that thing where you dive in, play with it for five minutes, it lies to you and you go, wow, that’s a waste of time.

But you’re selling yourself short if you do that. If you don’t then think to yourself, okay, so don’t use it for looking up facts. What can I use it for? What are the things it’s useful for? And so, so, yeah. So I very much disagree with them on that front.

The other thing I found interesting is I’ve started seeing conversations on Twitter from people who do machine learning research, who’ve spent the last 10 years working on natural language programming who are kind of utterly depressed right now. They’re like, it feels like I spent 10 years, like I earned a PhD trying to solve this little corner of this giant problem of how we get confused to do language; and GPT-4 comes along and it just does the thing that I’ve been trying to do for 10 years as a tiny fraction of its overall capabilities.

I’ve talked to machine learning researchers likewise, who are very despondent. They’re like, it feels like I’ve been working on these really hard problems for 10 years, and then this, quite frankly, dumb approach, right? Just throw one half trillion words into a bunch of computers for three months and train a model and it’s beating 90% of the stuff that I’ve been able to do. What the hell?

So yeah. So I think it’s very important not to fall into the trap of assuming because these things have holes that you can drive a truck through that they’re not useful. They are useful.

The people who know the most about this, most of them really are paying very close attention to this. I think the hype is not justified because the hype is just ludicrous, but there’s a sizable chunk of the hype that is justified.

So, yeah. I feel like you’re making a mistake if you assume that this stuff is a flash in the pan, that’s just going to go away again.

SHEFFIELD: Yeah. And I think the analogy here with some of these AI researchers that demand things be a certain way or otherwise they’re wrong, it reminds me of—the way that problem solving from LLMs has been developing, it reminds me of convergent evolution which is this idea in biology that multiple species that are not related to each other can solve the same problems, but do it in different ways.

WILLISON: Right. So like now, everything’s going to be a crab eventually, right? Everything it turns out, evolves in the direction of being a crab for some reason.

SHEFFIELD: But I mean in this sense of like flight for instance. So we’ve had pterosaurs, the dinosaurs, had figured out how to fly. They were reptiles, and we have birds. I know, obviously, they’re related.

Insects, various insects, I mean, there’s so many different insects that are only very slightly related to each other that all have figured out flight in different ways. And then of course, they’re not quite flying, but flying fish are able to propel themselves through the air. And this is true with regard to eyesight, how species develop organs to sense light and to perceive things.

There are many different ways that these problems can be solved. And to say that LLMs are just trash because, well, it’s not something that I personally have been working on, it’s almost like a fish saying that the eyesight that some single celled organism has doesn’t work. In fact, it does work. And it can sense light. And whether you think that’s the proper way of doing it or not, it really doesn’t matter because it’s doing it.

WILLISON: Yeah. Yeah, I mean I think AI critics are basically right about everything. They will point out flaws and they’re correct about those flaws, and the risks, and so forth. The only thing they’re wrong about is this stuff is useless. Because it’s definitely not, it’s useful for all sorts of things right now.

And we keep on finding new things that it can do. If there was no more development on AI at all, if we stopped everything and just stuck with the ChatGPT that we have today, we would still be finding new things it could do for the next few years. The state of the art would continue to increase even if models all stayed the same, because there’s so much that they’re capable of that we haven’t understood yet.

SHEFFIELD: I think that’s true. And so to that end though, are there any websites out there that you would recommend to people if they were interested in learning more of how to harness AI for their, for their own personal ends?

WILLISON: Websites pop up every day that claim to help you with AI, to be honest, at a rate that’s too far to even evaluate them and figure out which ones are good and which ones are snake oil. The thing that matters is actually interacting with these systems. You should be playing with Google Bard, and ChatGPT, and Microsoft Bing, and trying things out with a very skeptical approach.

Always assume that anything that it does, it could be making things up. It could be tricking you into thinking that it’s capable of something that it’s not. But that’s where you have to learn to experiment. You have to try different things, give it a URL, and then give it a broken URL and see how it differs between them.

Because that really is the most reliable way to get stuff done here. To sort of build that crucial mental model of what these things can do, and what they can’t. And it’s full of pitfalls. It’s so easy to fall into traps. So you do need to read around this stuff and find communities of people who are experimenting in it with, with you and, and so on.

Unfortunately, I don’t think there’s an easy answer to the question yet of how to learn to use these effectively, partly because ChatGPT isn’t even four months old yet. It’s four-month birthday’s on the 30th of March. All of this stuff is so new, we’re all figuring it out together. The key thing is, because it’s all so new, you need to hang out with other people.

You need to get involved with communities who are figuring this out. Share what you learn, see what other people learn, and basically try and help society as a whole come to terms with what these things even are and what we can do with them.

SHEFFIELD: Yeah. Well, and one interesting approach that the Midjourney image generator has done– which annoyed me at first– that they force you to use Discord in order to generate images.

I was like, I don’t want to have to use Discord. I don’t want to download that app. Don’t want to use the website. I’ve got two-factor authentication on my account. This is a real hassle. Ugh. I’m not going to do it.

But eventually, I knuckled under and did it anyway. And then I realized why they did it this way. Because the way that it works is you have to type it into a chat room with other humans and then you see what they’re coming up with as they are using it.

WILLISON: Right.

SHEFFIELD: And you can get ideas from them just simply looking at what they do, even if you never type anything.

WILLISON: That Midjourney thing is such an important lesson because there are a bunch of image generators out there, OpenAI have one called DALL-E. There’s Midjourney. There’s Stable Diffusion. Midjourney is head and shoulders above the rest in terms of what it can do. And I think that’s because of Discord. I think that’s because they put everyone in these public chat rooms and the rate at which people learned how to use Midjourney was phenomenal because everyone’s seeing what everyone else is trying out.

And so, I think that I said earlier that the key thing is we don’t know what they do. We need to learn what their capabilities are. The best way to learn their capabilities is to put half a million people in Discord room together and let them learn from each other. And that works. Midjourney is incredibly, incredibly successful as a business and as a community and it’s because people had to learn how to use it together.

So that’s, I think, one of my sort of big personal ethical concerns is you should share your prompts. There are websites where you can sell prompts to people. No, no, no, no. Don’t do that. Share your prompts with other people. You get them to share the prompts back. We are all in this together. And sharing the prompts that work for you and the prompts that don’t is the fastest way that you can learn, and the fastest way that you can help other people learn as well.

SHEFFIELD: Yeah. Yeah. I think that’s good. And maybe to summarize it, it would be the best way for a society to figure out how AI can help us is for individuals to figure out how it can help them.

WILLISON: Right.

SHEFFIELD: And share what they’ve learned.

WILLISON: Exactly. Yeah.

SHEFFIELD: Yeah. All right. So last question here of the conversation is the idea of open source.

So open-source software, for those not familiar, is the idea of publishing your code to the public, and your data, such that it could be built on by other people not affiliated with you. And the premise behind it is that knowledge can be compounded when you do it that way and don’t keep it to yourself.

And there’s other arguments for it, we don’t need to get into here. But for the purposes of artificial intelligence, there is debate now as to whether or not data training sets and code for AI programs should be published to the public. Because, there are people out there, so for instance, 4chan has been saying they’re going to develop their own basically seemingly Nazi-fied AI. Because they’re angry that ChatGPT won’t write them Hitler novels or things like that. So let’s talk about that a little bit.

What do you think about the state of open source and AI?

WILLISON: So, this is a fascinating area because people who work on AI tend to, they tend to have very altruistic purposes initially. They’re like, we are going to build this new thing that will help solve all of society’s problems. And for the last sort of 10 years, most AI research has been very public, in as much as they publish papers, they publish source code. They tended not to publish the models themselves because of the fears of what people could do with them. So OpenAI, initially it was only available to researchers.

They start the ChatGPT just four months ago was the point at which they really started encouraging members of the public to interact with these things where they’d already had a lot of time to tune it and try and de-Nazify it and so forth. But the flip side of this is if this technology is so transformational, the idea that just a few companies like Microsoft and Google and OpenAI control all of it is terrifying.

Should I have to use cloud services if I want to ask personal questions about my health? I’m not comfortable doing that. There are companies that are banning ChatGPT because they don’t want people copying and pasting the company’s internal secrets into a text box on a website somewhere.

So there’s clearly a very strong ethical argument for people should be able to run this stuff themselves. The flip side is that until very recently, you needed about a $20,000 supercomputer to even run one of these models, because they’re very resource intensive. You need like these A100 Nvidia cards that cost $8,000 each. You need a whole rack of those to run something like GPT-3. So I thought, even if they would release the models, what am I going to do with it? I can’t afford a computer that can run that. And then, well, three weeks ago, I think Facebook research released a new paper with an accompanying model called LLaMA, which was a model that was small enough that you could run it on consumer hardware, but it still had most of the capabilities of ChatGPT.

I thought that was impossible. I thought to get ChatGPT, you need the one of these $20,000 supercomputers. I was entirely wrong. And then Facebook made the model available to researchers. Somebody leaked it on BitTorrent and now everyone can get hold of this model, which is like a 250-gigabyte file.

So it’s not a small download. But then the open-source community kicked in and within a couple of weeks, people have shrunk it to the point where I can run it on my laptop. Somebody got it running on a Raspberry Pi, this supposedly ChatGPT capable model. Very slowly, but on a computer that costs like $40.

And that’s one of the big arguments for open source is that once you’ve got every nerd in the world playing with stuff, some of these problems like running on a Raspberry Pi just start getting solved really, really quickly. Stanford then did a project where they took Facebook’s LLaMA and they turned it into something called Alpaca, which was tuned for instruction.

So it had that that human reinforcement training, and now it really does behave like ChatGPT, and it runs on a laptop. A friend of mine ran it on his laptop in a flight and used it to help him solve some physics, like, questions he had about physics the other day, just like you would with ChatGPT.

I’m stunned. I was absolutely blown away that this technology is now capable of running on a laptop. I thought it would take another few years, at least before laptops were powerful enough to run anything like this. It runs on a Pixel 5 phone, which is like a two-year-old Android phone, can now run one of these smaller models.

And so really this means the open-source thing is happening and you can’t put it back in the bottle. Once this files out on BitTorrent, it’s on like a million computers now. It’s not going away. So we have to face the fact that yeah, 4chan, if they want to train their Nazi AI, the raw materials for them to do that are now available to them.

That’s a thing that is going to happen. But the flip side is that we can now start saying, okay, what does the world look like? What is it like to live in a world where I can run ChatGPT on my own devices, independent from the internet? I can teach it new things. I can use it as a trusted personal assistant.

I’m not leaking my data out to these big companies. That’s fascinating. So yeah, one of the things I’m tracking closely at the moment is the implications of this. What happens when suddenly these models are in the the hands of the public.

SHEFFIELD: And there are some, some more negative implications as well. The Stable Diffusion image generator has now been repeatedly used to generate pornographic images of people without their consent.

WILLISON: Right.

SHEFFIELD: So there are implications for that, and I think you’re right that these things are not going to be uninvented, the source is not going to be deleted.

But it is still nonetheless something to think about, especially with regard to future improvements to these engines or to completely different ones. ChatGPT 9, should that be available to the public, who knows? And ultimately, this is an area where the public needs to be having these, discussions need to be had in public and politicians have to be involved in this stuff because just simply allowing a handful of companies or universities to decide for us how these things should, what guardrail should be on them, whether they should be open sourced or not. These are not discussions that rightfully belong to the private sector, I think.

WILLISON: Exactly. No, I completely agree. And just in the past week, I’ve seen two new demos of text to video things. So like Stable Diffusion, except it produces a video. And they’re currently a bit shonky looking, but give it a year, and you will be able to type in a scene where some politician is smoking cocaine, wherever. And it will produce a realistic looking video. And again, we need antibodies in society.

There’s a TikTok account which publishes videos of Barack Obama and Donald Trump playing Minecraft together using deep fake audio. And it’s amazing. I mean, it’s really realistic. The voices sound exactly right except they’re talking about Minecraft. And I love that because anyone who’s seen that video now understands that audio can be faked. And that’s the sort of first step, right? We need society to at least understand that images and videos and audio can be deep faked now.

I mean, the flip side is that of course when a video comes out of a politician doing something bad, the politician can now say, oh, it’s a fake video. And I mean, maybe it is, maybe it isn’t. So, so there are, there are flip sides to that as well. But yeah, the idea that society needs to understand what this stuff is capable of so that it doesn’t get hoodwinked, I think is really important.

SHEFFIELD: Yeah. All right. Well, this has been a great conversation, Simon. I appreciate you being here. Let me put up on the screen your Twitter handle. I encourage everybody to follow you. You are @simonw, that’s S-I-M-O-N-W, for those who are listening. It’s been a great conversation.

WILLISON: Yeah, this has been really fun. Thanks for having me.

SHEFFIELD: All right, so that is the program for today. I appreciate everybody for being here and listening or watching or reading if you are a transcript person. Thanks for that. We’ve got a lot more episodes and they’re coming out every Saturday now. And thanks to the support we’re getting, we’re able to get production into regular releases. So I really do appreciate everybody who is a subscriber. Thanks very much. So I’ll see you next time.

Theory of Change #066: Simon Willison on technical and practical applications of ChatGPT and AI

Video

Transcript

Discussion about this episode