Anatomy of a Speech: How we teach computers to understand pictures by Professor Fei-Fei Li

This is a speech about sight. Whereas Professor Beard’s speech took a long, expanse look at the complex subject of female silence in the public sphere, Professor Fei-Fei Li’s speech takes an intimate look at an equally complex but completely different subject: how she and her colleagues are using machine learning to teach computers to understand pictures. What I love about this speech is that it takes a subject far too technically complex to explain in eighteen minutes (Professor Li reminds us that she’s been studying machine learning for over fifteen years) and somehow manages to make it understandable, relevant and exciting.

A Bit of Background

Professor Fei-Fei Li is Director of Stanford’s Artificial Intelligence Lab and Vision Lab. In her words, she builds ‘smart algorithms that enable computers and robots to see and think’. She became well known for her creation (along with her colleagues) of ImageNet, a vast visual database that uses crowdsourcing to annotate and classify images. It is freely available for researchers around the world to use, thus providing an unparalleled resource for those working in the field of visual research.

Scientists, engineers and those working with digital technology face the task of persuading their audience of the importance of their work to wider society. How do you walk out onto a stage and persuade an audience of the relevance of your work, armed with nothing more than your research, your passion and your words? In simple terms, why should wider society care about your specific research? Professor Li manages to walk the fine line between making her subject understandable and patronising or indeed losing her audience to boredom. She treads a fine line between explaining her topic clearly to an audience of non-experts and never patronising them. As this is a TEDtalk, she will also have been thinking about the audience outside the room: the online audience. She never assumes expertise, but neither does she assume ignorance. She assumes interest, but not expertise.

Fifteen years in eighteen minutes

Machine learning is a fascinating topic, but how do you explain its complexities and relevance in eighteen minutes? Professor Li succeeds, I believe, by deploying two interconnecting tools that both structure her speech and persuade the audience of the importance of her work. She uses the structure of a journey, with images of challenge, suspense and success to guide her audience through her research. Her use of simple pictures work with the storytelling aspect of her structure to explain the journey of her research. She uses simple pictures aligned to her storytelling to explain this journey. She also uses metaphors, especially that of blindness/sight, to both humanize the computers she works with and to show the relevance and importance of her research to wider society.

Her very first line shows these two interconnecting tools. ‘Let me show you something’. Immediately she has her audience’s attention. Something’s going to be revealed. And she introduces the metaphor of sight that she’s going to use throughout her speech. In five words, she’s already won over her audience and given herself a starting point from which to structure her speech. And, a short while later, she tells the audience that she is here ‘to give you a progress report’. This speech is not going to be a lecture, but using the language of research, she makes the audience part of her world and part of her team. She makes them her equals, and continues to treat them as such throughout the speech.

Research as a Journey

It is difficult for researchers to know how to structure their speeches. Just explaining your work is not enough. But your speech will not necessarily follow the same structure as, for instance, a politician’s speech seeking to persuade a citizen to vote for them. Professor Li uses the structure of a journey to show the challenges of her research: its failures and successes. The structure of a journey allows her to create suspense in the audiences’ mind and the struggles she has had to overcome show why she thinks her work is so important.

Later she explains that, ‘about eight years ago, a very simple and profound observation changed my thinking.’ I love the binary opposition of ‘simple’ and ‘profound’ here, but in terms of journey structure, it’s a personal revelation that sets up anticipation in the minds of her audience. This is her personal journey through the challenges of her research, not just an abstract explanation. But her research is also ‘revolutionary’ and at the ‘frontier’: to my ears, very American terms that link her work to the American spirit of progress.

Towards the end of her speech, she asks, ‘So wait a minute. Is that it? Not so fast.’ She has led her audience on a journey. Just as they think they have arrived at the destination, she sets up another challenge. She reveals that, even after all this work, they had only taken ‘the first step’. And, excitingly, her research has just taken another step: ‘About four months ago, we finally tied all this together and produced one of the first computer vision models that is capable of generating a human-like sentence when it sees a picture for the first time.’ Her use of a journey structure means that she only needs to say, ‘Now I’m ready to show you’, because the audience is already persuaded of the importance of her research and excited to take the next step with her.

Computers, Cats and Children

Linked to her journey structure is her use of images. In speeches about technically complex subjects, researchers often rely on pictures and graphs to do the work for them. In fact, no picture can explain your work to an audience if you cannot explain it in words. What Professor Li does so well in this speech is marry pictures with her clear explanations for her research.

She uses simple and funny images of cats to explain the journey of her research. Instead of using abstract terms to describe her work, she takes a concrete example and guides her audience through the challenges of her work. This works especially well in such a short speech, because it provides a way to quickly describe her research without oversimplifying or indeed overcomplicating it. It’s worth quoting this paragraph in full, so you can see how she takes her audience though the complex challenges of her research using only one example and a few pictures.

The first step towards this goal is to teach a computer to see objects, the building block of the visual world. In its simplest terms, imagine this teaching process as showing the computers some training images of a particular object, let’s say cats, and designing a model that learns from these training images. How hard can this be? After all, a cat is just a collection of shapes and colors, and this is what we did in the early days of object modeling. We’d tell the computer algorithm in a mathematical language that a cat has a round face, a chubby body, two pointy ears, and a long tail, and that looked all fine. But what about this cat? (Laughter) It’s all curled up. Now you have to add another shape and viewpoint to the object model. But what if cats are hidden? What about these silly cats? Now you get my point. Even something as simple as a household pet can present an infinite number of variations to the object model, and that’s just one object.

Notice how she marries her use of cat images with rhetorical questions, giving the audience a short history of her research and the challenges she faced. She can then be sure that they understand how hard the task of teaching ‘a computer to see objects’ really is: ‘Now you get my point’. Simply though showing different pictures of cats, she explains clearly why older versions of her research failed.

She interweaves complex and technical terms, teaching her audience about her research, with concrete examples, to persuade her audience of the importance of her work. When using big abstract numbers, she makes them as tangible as possible. The ‘database of 15 million images’ is quickly narrowed down to ‘62,000 cats’. And the ‘Amazon Mechanical Turk workers’ become ‘almost 50,000 workers from 167 countries’ who ‘helped us’ in the challenge. She also links the abstract and technical language of machine learning to the image of the brain: ‘Just like the brain consists of billions of highly connected neurons, a basic operating unit in a neural network is a neuron-like node.’ Her passion comes through when she explains that ‘the convolutional neural network blossomed in a way that no one expected.’ What is brilliant about this image is that it shows the excitement of the unexpected and the beauty of the blossoming network whilst never dumbing her research down. She then relates these abstractions to simple examples, again of a cat, linking the challenges of her earlier work to the way she faced the challenge. It’s a clever comparison that works well in such a short speech.

She uses the metaphors of blindness and vision to explain her work in human terms and link her research to wider society. It is another simple, powerful binary opposition that persuades the audience of the necessity of her research. It is at the very heart of her research.  She is trying ‘to teach computers to see’ because ‘collectively as a society, we’re very much blind, because our smartest machines are still blind’. She also locates these metaphors on the human body – in fact, on the bodies of her audience: ‘You and I weave together entire stories of people, places and things the moment we lay our gaze on them.’ This wider society also includes ‘Mother Nature’, who has taken ‘540 million years of hard work to do this task, and much of that effort went into developing the visual processing apparatus of our brains, not the eyes themselves. So vision begins with the eyes, but it truly takes place in the brain.’ Her research is something Mother Nature has struggled with for millions of years. Eventually, her work will provide ‘an extra pair of tireless eyes’ that will work with us. The blindness of society is solved by marrying the tools of nature with her research, giving ‘sight to the machines’.

She also shows both the journey and the struggle of her research through the image of a child. Linking machine learning to the image of child’s mind does three things: it humanizes her research and makes abstract complexities easier for the audience to understand; it shows the huge challenge of her research, because she is trying to do nature’s work; and it demonstrates the importance of her work for the wider society. The three-year old child is already an ‘expert’ at what Professor Li has been struggling with for fifteen years. After describing the challenge of her research in terms of huge numbers, she explains: ‘That was how much effort it took to capture even a fraction of the imagery a child’s mind takes in in the early developmental years.’ And this is a struggle that continues, as ‘the real challenge is to go from three to 13 and far beyond.’ Towards the end of her speech, she uses the image of her son, reinforcing the sense that this is a personal struggle that is vital to ‘the future world he will live in’. The abstract image of the child is now personal. And her very last line underlines the importance of her research in terms I have already looked at. This is her ‘quest’. In the simplest possible terms, she persuades her audience that this personal challenge has real relevance for the future not only of her son, but that of the human species.

I’ve looked at these two rhetorical tools – the journey structure and her use of images and metaphors, because I believe it is these that make her speech work so well in this specific situation. Professor Li considered her audience and their needs. But just as importantly, she is not afraid to use her expertise. What she manages to do so well, and what any scientist, engineer or researcher should take note of, is marry this expertise with the rhetorical tools we have explored to elucidate her research and show how vital this work is for her audience and for the future of society. All in eighteen minutes.

A few articles about Professor Li:

If We Want Machines to Think, We Need to Teach them To See – Wired – Marguerite McNeal – 04/15

One immigrant’s path from cleaning houses to Stanford professor – CNNMoney – Octavio Blanco – 22/07/16

Melinda Gates and Fei-Fei Li Want to Liberate AI from “Guys With Hoodies” – Wired – Jesse Hempel – 05/04/17

ELLE’S 2017 Women in Tech: Star Tech Voyagers – ELLE – Molly Langmuir – 12/07/17

For those interested in how scientists and researchers can communicate more effectively, as great place to start is my wonderful colleague Denise Graveline‘s blog   don’t get caught. In particular, this post: 7 ineffective habits of scientists who communicate with public audiences

She also runs one of my favourite blogs, The Eloquent Women, all about women and public speaking. It’s a treasure trove of speechwriting and speechmaking inspiration and it’s where I got the inspiration for these Anatomy of a Speech posts. 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: