[TRENDS] Foundation Models

00:00

0.5
1
1.25
1.5
1.75
2

This is a podcast episode titled, [TRENDS] Foundation Models. The summary for this episode is: Now Live! This special *trends* episode is on <a href="https://en.wikipedia.org/wiki/Foundation_models" rel="noopener noreferrer" target="_blank">Foundation Models</a>. Listen in as host <a href="https://www.linkedin.com/in/jerry-cuomo/" rel="noopener noreferrer" target="_blank">Jerry Cuomo </a>is joined by fellow <a href="https://www.ibm.com/consulting/" rel="noopener noreferrer" target="_blank">IBM Consulting</a> Fellow, <a href="https://www.linkedin.com/in/blaine-dolph-5078b96/" rel="noopener noreferrer" target="_blank">Blaine Dolph,</a> for a discussion on this exciting emerging trend in artificial intelligence. Blaine provides a definition of Foundation Models including what makes them a breakthrough worth paying attention to in the months to come. Jerry and Blaine share several examples using the <a href="https://beta.openai.com/" rel="noopener noreferrer" target="_blank">OpenAI playground</a>, and with a little help from Digital Jerry (DJ), demonstrate how these models can be applied to automate aspects of your everyday work life. Blaine discusses how early examples of models, like <a href="https://beta.openai.com/docs/models/gpt-3" rel="noopener noreferrer" target="_blank">GPT-3</a>, <a href="https://en.wikipedia.org/wiki/BERT_(language_model)" rel="noopener noreferrer" target="_blank">BERT</a>, or <a href="https://openai.com/dall-e-2/" rel="noopener noreferrer" target="_blank">DALL-E 2</a>, have shown what’s possible. Input a short prompt, and the system generates an entire essay, or a complex image, based on your parameters, even if it wasn’t specifically trained on how to execute that exact argument or generate an image in that way.What exactly is a foundation model, you ask?A foundation model is a <a href="https://www.techopedia.com/definition/30325/deep-learning" rel="noopener noreferrer" target="_blank">deep learning algorithm</a> that has been pre-trained with extremely large data sets. In many cases the data is scraped from the public internet including Wikipedia and GitHub.Unlike narrow artificial intelligence (<a href="https://www.techopedia.com/definition/32874/narrow-artificial-intelligence-narrow-ai" rel="noopener noreferrer" target="_blank">narrow AI</a>) models that are trained to perform a single task, foundation models are trained with a wide variety of data and can transfer knowledge from one task to another. This type of <a href="https://www.techopedia.com/a-laymens-guide-to-neural-networks/2/33260" rel="noopener noreferrer" target="_blank">large-scale neural network </a>can be trained once and then fine-tuned to complete different types of tasks.Foundation models contain hundreds of billions of <a href="https://www.techopedia.com/definition/34625/hyperparameter-ml-hyperparameter" rel="noopener noreferrer" target="_blank">hyperparameters</a> that have been trained with hundreds of gigabytes of data. Once completed, however, each foundation model can be modified an unlimited number of times to automate a wide variety of discrete tasks.

Key Takeaways

Transcript

Foundation Model - Definition

01:09 MIN

Restaurant Review Example

00:38 MIN

Next Steps

00:55 MIN

DJ: You are listening to The Art of Automation podcast with your host, Jerry Cuomo.

Jerry Cuomo: Hey! Welcome to The Art of Automation, a podcast that explores the application of automation in the enterprise. Wait one second, DJ. Don't go too far. I'm going to need you later in the episode, okay? Hey, folks. Today, we have a special episode. One in a small series that will look at technology trends related to AI and automation that are both making significant breakthroughs. And as such, it's a topic that I highly recommend you pay attention to in the months to come. I hope my guest and I will first get you excited about the topic, and then we'll leave you with some links in the description section of the podcast so you can try out the technology and learn more, and maybe even start a project to try it on for size at your company. Our first special topic is foundation models. And without further ado, I'm going to invite my guest, Blaine Dolph, to join me and share some insights as to why we're so excited about this breakthrough in artificial intelligence. And by the way, folks, Blaine is a close colleague of mine at IBM Consulting and he's an IBM fellow, and has one of the most tuned eyes for spotting emerging technologies with a knack for helping users figure out how and where best to apply them. And with that, I'd like to welcome Blaine to this special episode of The Art of Automation. Welcome, Blaine.

Blaine Dolph: Hey, Jerry. Thanks a lot for having me. I really enjoy some of your prior podcasts.

Jerry Cuomo: Why thank you, Blaine. I have a sneaking suspicion, given this special episode, it just might be a keeper as well. So Blaine, can you start by providing our listeners with a definition of foundation models? And also, can you help highlight the breakthroughs that make them so awesome?

Blaine Dolph: Okay, sure. So foundation models are an AI techno` logy. They've been around, I'd say, about five years. But really, in the past two years, have gained a lot of attention. And I think that's because they've become a lot more powerful lately. They've been consumable as cloud- based APIs and that's led developers to launching really creative apps, which has caught the general public's attention on this thing in the media. The typical definition is that foundation models, they're models that are trained on very large sets of data. They can be fine- tuned to perform a wide range of tasks. And initially, the models were primarily text- based language models, like GPT, MBR, around natural language.

Jerry Cuomo: Yes.

Blaine Dolph: But really, any type of data can be part of a foundation model. For example, images, videos, or speech. And by large, we mean really large, right? So GPT- 3, for example, was pre- trained with petabytes of data by collecting data that they crawled throughout the internet, all sorts of online books that were available, and also Wikipedia. So that's the definition.

Jerry Cuomo: And then, just some reference. A typical machine learning model versus a foundation model. What's the range of parameter differences?

Blaine Dolph: Yeah, so a typical model is going to be probably between 10 million, say, maybe a 100 million parameters. And a foundation model, you're starting to talk about 10 billion. GPT- 3 was 175 billion.

Jerry Cuomo: Wow.

Blaine Dolph: And the newer ones that are coming out are going to be even larger than that.

Jerry Cuomo: That's impressive.

Blaine Dolph: You asked about the breakthrough and what makes them so awesome, so I thought about prior to foundation models, right? We'd go out to a customer to build an AI- powered solution of some sort, and we'd spend significant time collecting the data sets, even if the data sets were available in the first place. Then we'd spend hundreds or thousands of human hours labeling and training the data. And then after that, what you'd get is a model, but it would perform, pretty much, narrow tasks such as, say, next best action, right? That's a good example.

Jerry Cuomo: Yes.

Blaine Dolph: So the paradigm shift for foundation models... There are three reasons, three sort of shifts that I see.

Jerry Cuomo: Okay.

Blaine Dolph: One is that the foundation models... They're pre- trained with huge data sets like we talked about, but they use self- supervised learning approaches, which removes that bulk of the human labeling of the data. And therefore, it makes them feasible to even begin with to create. Because there's such a large number of data, you couldn't have a human go out and tag everything, right?

Jerry Cuomo: Yeah. So self- managing, that's important.

Blaine Dolph: Yep. And then number two is reuse, in terms of the value. So once you have a foundation model, you really can get to your end- state model that's bringing the value by fine- tuning that pre- trained model with a very small, subset, relatively rate of new data. One of our colleagues explained it to me like this, which resonated. He said, " Think of it as a child." You learned English.

Jerry Cuomo: Right.

Blaine Dolph: Now, 20 years later, you go to law school. Law school doesn't have any classes in English. You fine tune your brain with legal cases and laws on top of that English foundation you already have. So to me, that really brings it to life of what the value is in terms of the reuse.

Jerry Cuomo: And Blaine, what was that third reason?

Blaine Dolph: So the model you end up with can be used for a general set of tasks versus a very narrow task. And in fact, once you get to a size of about the 10 billion parameters, the model becomes promptable. And what I mean by that is you can actually ask it to do things using natural language.

Jerry Cuomo: Very nice. Hey, Blaine, can you walk us through some examples here? Because what you're saying sounds good in theory, so share some practice with us.

Blaine Dolph: Sure. Let me start with the language model, GPT-3. So there's a company called OpenAI that's created and hosts this model. And you can access it using either an API or with a simple web interface, and that interface is called a playground.

Jerry Cuomo: Hey, Blaine. I got an idea. Let me ask my digital twin, DJ, to help us out here.

Blaine Dolph: Okay.

Jerry Cuomo: Given this is an all audio podcast, I think DJ can play back the audio responses from the playground and make it perhaps more interactive, maybe easier to picture. And if that fails, he'll at least make it fun, if not a bit quirky.

Blaine Dolph: Okay. I feel honored DJ is available on this.

Jerry Cuomo: DJ happens to be here, yes. He happens to be-

Blaine Dolph: I know he's hard to find.

Jerry Cuomo: inaudible Yeah.

Blaine Dolph: He's hard to lock down, yeah. I know he's very busy. Okay. Yeah, so the playground, as I mentioned, it lets you explore the different sort of fine tunings that I mentioned that foundation models do really well. And with this one model, then you can give it prompts to do things like question and answers, text completion, or story summarization. Let's jump in and try two of those examples, so we'll try text completion first.

Jerry Cuomo: Yeah.

Blaine Dolph: So with text completion, you'd go out to the OpenAPI site. You'd go into the example section and pick text completion, and then you basically get a text box. And I'm going to type in this prompt: Brainstorm three ideas for blog topics for the metaverse.

Jerry Cuomo: Okay.

Blaine Dolph: So DJ, can you give the response of what OpenAPI gave?

DJ: Sure thing. And I really do appreciate you and human Jerry for asking me to help out. Okay, here goes. OpenAI gave three responses. Number one: Exploring the possibility of social interaction in the Metaverse. Two: Examine how brands are leveraging virtual reality technology to enhance customer experience. Three: Understand potential impact of the metaverse on business and society.

Blaine Dolph: Okay. And what I find really interesting about this is this wasn't a search to a webpage where somebody had put some ideas for blog topics. These were dynamically generated by the model. If you notice number two, it didn't even have the word metaverse in it.

Jerry Cuomo: Right.

Blaine Dolph: So GPT- 3 understood that metaverse had something to do with virtual reality technology and it inserted it in there.

Jerry Cuomo: Yeah, that is impressive. In fact, Blaine, there are several other cool, if not thought- inspiring examples of text completion.

Blaine Dolph: Yeah.

Jerry Cuomo: And the cool part is that they generate complete and thoughtful responses, as you said. One of the ones that I love to demo is the restaurant review generator. And if you don't mind, let me sneak a demo in of this-

Blaine Dolph: Okay! Yeah, that sounds good.

Jerry Cuomo: And give it a try here. So in OpenAI playground, you just type a few words to seed the model and generate the review. So let's go with the name of the restaurant is... Let's say Blaine's Place, with the description of... Let's just pick three things: hot food, great service, and fast. DJ, can you play back how the model would respond to that?

DJ: Yes. Sure. The generated restaurant review reads, " I recently visited Blaine's Place and I had a wonderful experience. The food was hot and delicious. The service was great and the speed of service was fast. I would definitely recommend Blaine's Place to anyone looking for a great meal."

Jerry Cuomo: All right. Blaine, maybe you can share just one more example.

Blaine Dolph: Yeah, let's take a summarization example because I find this one one of the most interesting things. What I've done here is... So Jerry, I've seen that you've been a pretty prolific podcaster as well as a blog writer-

Jerry Cuomo: Oh, boy.

Blaine Dolph: And your blogs tend to be fairly technical. So I went out and looked for one. I found one called IBM Blockchain Platform Meets Red Hat OpenShift.

Jerry Cuomo: Okay.

Blaine Dolph: And it had about 352 words. So I went into OpenAPI playground and I said, " Can you summarize this text as if I was a second grader?" And then I pasted in the 352 words from your blog-

Jerry Cuomo: Okay.

Blaine Dolph: And it came back with 48 words, so it was pretty concise. So DJ, can you get the response?

DJ: Okay. Open AI playground responded with, " IBM and Red Hat have joined forces to make it easier to build and deploy blockchain networks. The new IBM blockchain platform is optimized to run on Red Hat OpenShift, a state- of- the- art enterprise platform, allowing users to deploy, operate, and grow blockchain networks in multiple cloud or hybrid cloud architectures."

Blaine Dolph: So to me, it really did... I read your original. That really sort of netted it out, right?

Jerry Cuomo: Yeah, it did! In fact, Blaine, funny story. I showed an example like this to a CIO last week and I said something like, " While these exact examples don't necessarily apply to a day in a life of a CIO," he said, " Stop right there, Jerry. Most of what I do day- to- day is summarize and distill things. When it's managing up, I have to give a summary of why there was an outage last weekend. Or when I manage down, summarizing our new corporate strategy." So playing in this playground is something I highly recommend because it triggers thoughts for sure. And jeez, Blaine, I was thinking how do we make our day- to- day more productive? I would love to use this to summarize our WebEx meetings.

Blaine Dolph: Yeah.

Jerry Cuomo: Especially when I can't make it to a meeting and someone says, " Don't worry, Jerry. I'll record it or I'll send you a transcript." Well, quite honestly, I almost never have time to go back. But if you can summarize a 30- minute meeting to a couple paragraphs? Well, that would work!

Blaine Dolph: Yeah, absolutely. And then if you see something in the summary that sort of interested you, then you could go into the more detail, right?

Jerry Cuomo: Yeah. So Blaine, maybe carrying the point that you made earlier about stacking these models, like your example about learning English and then going to law school... As an IBM fellow and thought leader for IBM Consulting, how are you thinking about foundation models from a consulting perspective?

Blaine Dolph: Yeah, sure. I like to play with emerging technology as much as the next person, right? But what I really enjoy is applying the technology to help IBM Consulting's clients. And I really do believe that foundation models can dramatically accelerate AI adoption in the enterprise. This then, in turn, allows clients to create new digital experiences for their customers, and also makes their internal work efforts more efficient. With that, I'd say I'm focused on three areas. So one area that you and I are both working on with our colleagues is industry foundation models. So we're talking with CTOs and many of our clients to understand use cases for industries like financial services, healthcare, CPG, and also business processes like supply chain and contact center. And in each of these domains, there really is plenty of unlabeled data available in the enterprises, which can be used to train custom foundation models, which then potentially opens up solving business problems that were maybe too time consuming to solve. So that's number one. And number two is because of your podcast here, I have to mention automation, right? And I really do believe...

Jerry Cuomo: I was waiting to get to that, but go ahead.

Blaine Dolph: Yeah, of course, right? I'd get on the cutting room floor here if I didn't mention automation. But I do believe automation is a really great use case... and I do think foundation models pair really nicely with automation... because I think of it as you have a business or an IT process flow. Those are made up of lots of steps.

Jerry Cuomo: Right.

Blaine Dolph: And with automation, you go and look at each of those steps and say, " Hey, what could be automated?" So to me, because foundation models are so general purpose, they're going to enable more of those steps to be automated with less coding because they can reach out and ask a model to do something.

Jerry Cuomo: I see. Makes sense.

Blaine Dolph: And I sort of tie it to ... In automation, you use the word skills. So I think of one of these foundation models as having a broad range of skills.

Jerry Cuomo: Okay, automation makes a lot of sense. Well, you know I was going to like that one. And your third?

Blaine Dolph: Yeah, so I think that foundation models will really impact the software development experience.

Jerry Cuomo: I agree.

Blaine Dolph: An example that's available now is GitHub Copilot.

Jerry Cuomo: Yes.

Blaine Dolph: It's like an AI- powered pair programmer. That's not available yet, I don't think, on enterprise accounts. But at least, as an individual, you can go out and try that. And this has a lot of potential especially, I think, for junior developers to help them when they get stuck. And I think if you had a foundation model that could write really high quality unit tests, I think that will greatly help. And also using an IBM model called CodeNet, and that can modernize legacy code into more modern and container- friendly languages. There's just a lot of productivity gain that's possible here.

Jerry Cuomo: Oh, very cool. Boy, I could only imagine if I had access to this technology when I was a junior developer. Jeez.

Blaine Dolph: Right! Yeah, exactly.

Jerry Cuomo: So Blaine, our listeners are clearly intrigued now. What advice do you have for them to do next? What's the next steps? Where do you go to learn, experience more? If you're a business person, if you're an IT developer type, where do you go next?

Blaine Dolph: Yeah, so I want to set the understanding that we're still in the very early stages of using foundation models for industry and enterprise clients. A lot of the cases I've cited are more consumer- based and sort of getting people excited and exploring. I would for sure go out to OpenAI. I'd go out to a site called Hugging Face and explore the models there via their playgrounds. If I was a business user, I'd start to think through the use cases for my industry where foundation models could have a big impact, and then work with a dev team to build some sort of demo. And I'd also think broader than just text data, right? Think about images, video, and speech. If you go look at a DALL- E 2, that's a really great example where they've trained images and text together and you can actually use natural language to actually create new images.

Jerry Cuomo: Good!

Blaine Dolph: And if I was a developer, I can go two ways. I'd build some demos that leverage a foundation model. Or two, if you're really passionate on the data science side, dig deeper into how you create and fine tune the models. And also, keep in mind that because of the nature of reuse of the foundation models, it's really important that they don't include bias so that you're not baking that bias into your foundation model that's going to be reused.

Jerry Cuomo: Very good. So Blaine, thank you very much. This is certainly a technology trend related to AI and automation that we're going to keep a very close eye on as we get into the new year. I want to just thank you for your clear articulation of the space. You certainly made me even more excited about this, if that's possible.

Blaine Dolph: Good! Yeah, I look forward to working with you on it next year.

Jerry Cuomo: Okay! This episode is a wrap. I hope now that if foundation models weren't on your mind, they are now. And perhaps, you're curious and excited about them and you might even start thinking about how you might apply these where you work. Well, we've attached some links in the description area of the podcast for you to learn more and play around with these models. And Blaine and I would love to hear from you and give you a helping hand to sort out how to apply them and to share our updates on what we're doing inside IBM on this breakthrough trend in artificial intelligence. Okay, folks. That's it for today. And if you enjoy this podcast, it's also likely that you'll enjoy The Art of Automation book, which is now available. A link to the book is available in the description section, and royalties from this book are being donated to the American Cancer Society. Okay! Once again, I'd like to thank Blaine. And of course, I'd like to thank you all for listening in. This is Jerry Cuomo, IBM Fellow and VP for Technology at IBM. See you again on an upcoming episode.

DESCRIPTION

Now Live! This special *trends* episode is on Foundation Models. Listen in as host Jerry Cuomo is joined by fellow IBM Consulting Fellow, Blaine Dolph, for a discussion on this exciting emerging trend in artificial intelligence.

Blaine provides a definition of Foundation Models including what makes them a breakthrough worth paying attention to in the months to come. Jerry and Blaine share several examples using the OpenAI playground, and with a little help from Digital Jerry (DJ), demonstrate how these models can be applied to automate aspects of your everyday work life. Blaine discusses how early examples of models, like GPT-3, BERT, or DALL-E 2, have shown what’s possible. Input a short prompt, and the system generates an entire essay, or a complex image, based on your parameters, even if it wasn’t specifically trained on how to execute that exact argument or generate an image in that way.

What exactly is a foundation model, you ask?

A foundation model is a deep learning algorithm that has been pre-trained with extremely large data sets. In many cases the data is scraped from the public internet including Wikipedia and GitHub.

Unlike narrow artificial intelligence (narrow AI) models that are trained to perform a single task, foundation models are trained with a wide variety of data and can transfer knowledge from one task to another. This type of large-scale neural network can be trained once and then fine-tuned to complete different types of tasks.

Foundation models contain hundreds of billions of hyperparameters that have been trained with hundreds of gigabytes of data. Once completed, however, each foundation model can be modified an unlimited number of times to automate a wide variety of discrete tasks.

Today's Host

Jerry Cuomo

|IBM Fellow | Teacher - https://www.linkedin.com/in/jerry-cuomo/

Today's Guests

Blaine Dolph

|IBM Fellow

Blaine Dolph is an IBM Fellow and leads Application Innovation for IBM Consulting. He applies emerging technologies to solve clients’ business problems in innovative ways. He is a continuous learner and recently earned both AWS and GCP Foundational Certifications. He has worked on many first of a kind projects that provide great user experiences enabled by emerging technologies. Over the years, he’s led the technical strategy, created differentiated software assets and been the CTO launching new IBM teams including IBM Interactive North America, the global IBM Mobile Center of Competence, CTO of the Apple and IBM Partnership, Hybrid Cloud, IBM Consulting’s SaaS Delivery Platform called IBM Consulting Essentials, the IBM Consulting/IBM Research Partnership and works with many of IBM’s strategic partners.

https://www.linkedin.com/in/blaine-dolph-5078b96/

Digital Jerry [DJ]

|Jerry Cuomo's digital twin

D.J. is an abbreviation for Digital Jerry. D.J. has made several appearances on The Art of Automation podcast, including Episode 6 – 2021 Predictions as well as several bonus episodes. The D.J. persona is powered by a high-quality text-to-speech engine, enabling it to perform interviews with Human Jerry during podcasts. D.J. plays well into the automation theme and hopes someday to evolve into a digital employee.