Learnings for Machine Learning Practitioners From The Movie "Her"

‘Her’, the beautiful piece from December 2013 is under the Romance genre, but it depicts resonant visions of the future. However, we are not writing this article to review the movie. Instead, we are here to bring up the movie’s resemblances to AI features and how it can inspire ML, machine learning practitioners, first pointed out by Richmond Alake (1).

More specifically, we will talk about the impending convergence of AI, artificial intelligence content conveyed in the movie, and the trajectory of today’s AI developments.

“HER” 2013

Warning: Subtle Movie Spoilers Ahead!

The film Her follows Theodore Twombly (portrayed by Joaquin Phoenix), whose job is to create events and celebratory cards. He develops his relationship with Samantha (voiced by Scarlett Johansson), an artificially intelligent virtual assistant.

Even in 2021, there is so much to uncover in all the technology presented in this movie, especially from an AI perspective. However, you will have to look past Theodore’s 80’s mustache and high-waisted pants several characters wore.

AI Technology

While the movie has no flying cars, we have several intuitive artificial intelligent programs that come in the shape of video game characters, personal assistants, home control systems, and more.

Here are some of the relevant AI technologies we can pick from the movie:

AI Personal Assistants: A system built to perform application services and execute specific commands written or uttered by a user
Speech/Voice Recognition: A process by which artificial systems recognize a vocalized language
NLP, Natural Language Processing: A part of AI involving technologies by which artificial systems examine and process an extensive amount of text for understanding and context
Computer Vision: An AI-related area where a computer system has capabilities to understand scenes
AGI, Artificial General Intelligence: A type of AI that can perform human tasks

AI Personal Assistants, Artificial OS

Samantha is a virtual assistant, introduced at the beginning of the movie. It is an AI OS operating system, referred to as OS1 by its creators.

We already know and perhaps extensively use Google Assistant, Alexa, Siri, Cortona, and the likes. However, if you have watched the movie “Her,” you know Samantha is on a whole new level. She is self-aware and can traverse through a huge amount of data on the internet within seconds.

At its inception, Samantha’s knowledge comes from the information derived from the internet. Such a learning method is similar to training the notorious GPT-3 language model (2).

For those unversed, Generative Pre-trained Transformer 3 is an autoregressive language model which utilizes deep learning to produce human-like text. It is a third-gen language prediction model in the GPT-n series, the successor of GPT-2 developed by OpenAI, a San Francisco-based AI research laboratory (3).

Even though GPT-3 is not an artificial assistant, it has the potential to be the foundation of knowledge-base and common sense within virtual assistants. It consists of over 175 billion trained parameters and exceeds several NPL-based tasks, most notably question-answering.

Most interactions happening with virtual assistants are also question-answer based, with typical questions like “What is the weather like today?” “How many meetings do I have today,” and the likes.

The developers at OpenAI measured GPT-3 learning capabilities in different types: few-shot learning, one-shot learning, and zero-shot learning.

Few-shot learning is a neural network training method that includes the limited training dataset presentation related to the task to be solved to the model.
One-shot learning is a training process that includes presenting a single occurrence of the training data to a model.
Zero-shot learning refers to the lack of training data presented to the model. In this case, only an instruction detaining the task to be completed to the model.

Samantha, the OS1 virtual assistant, uses a learning method exhibiting similarity to the few-short learning method. When Theodore initiates the virtual assistant up, he is presented with a few abstract questions that mark his social and interaction levels, and we can see it as some type of initial training.

In September 2020, Microsoft received exclusive access to the GPT-3 source code (4), and perhaps we soon may see Cortona with an embedded GPT-n series language model to exercise cognitive abilities.

Voice and Speech Recognition

Today, our smartphones are embedded with voice recognition software which is no different from the film “Her.” The voice recognition capabilities of smart devices presented in the movie don’t venture too far off from today’s systems’ capacities within our smartphones.

Theodore, our main character in the movie, is a celebratory writer. He employs intuitive speech-to-text technology to make heartfelt messages within the letters he writes. Samantha uses her TTS, text-to-speech features to read emails back to Theodore.

Text-to-speech is the process of turning digitized text and writing into sound.
Speech-to-text is the process of turning uttered sounds into digitized text.

TTS synthesis is an ongoing research area. Its ML models have a network that leverages neural networks because of the input data’s temporal and sequential nature. Recently, architectures have emerged that leverage Transformers and deep convolutional networks (5, 6).

The TTS technology portrayed in the movie “Her” is not too far off from the abilities of the solutions embedded in apps such as Google Docs.

Samantha also read emails using TTS abilities; we don’t believe Gmail has such functionality yet – we may be wrong. However, there are some chrome extensions available that let you experience the feeling of getting your emails read (7).

Another attribute that Samantha currently lacks in today’s technology is its mimicry of inhaling and exhaling air via the nasal passage while speaking. Its capability to mimic the breathing process sounds while speaking makes interactions with this AI assistant feel even more human-like.

Perhaps, ML practitioners should consider implementing a system that can mimic the vocal track sounds when uttering words to make Google Assistant and Alexa, Siri, and the likes feel more human.

NLP, Natural Language Processing

NLP is a part of AI involved with the technologies by which systems examine and process a huge amount of text for understanding and context.

In the movie, Samantha depicts the capability to understand human language via uttered words and texts.

The movie’s creator took it a step further by giving Samantha the ability to empathize and even mimic the feeling of empathy when reading Theodore’s emotional letters.

Sentiment analysis is the process by which virtual assistants can analyze texts via NLP and extract emotional information about a subject. In terms of performances, Samantha’s sentiment analysis capabilities are flawlessly human-like.

Interestingly, sentiment analysis is not limited to text-based information. There is also a decent amount of research towards extracting sentiment information from music and other art forms.

There is a scene in the “Her” movie where Theodore instructs his music-playing software to play a sad song.

While the uttered command and that particular scene didn’t last longer than five seconds, it was enough to intrigue as to what mechanism it would take for a system to extract sentiment from music and have the ability to index a music database based on emotion.

And since songs can expose different emotions to different people, it makes us wonder about the training data structure of such a model.

Computer Vision

It is a field related to the incorporation of scenic understanding abilities of a system. Computer vision tasks like face recognition, object detection, and pose estimation are primarily solved with deep learning methods.

To interact with Samantha, Theodore has a pair of an earpiece, resembling AirPods, and a nifty square-shaped device with a screen to see Samantha’s writings. The device also has a camera, which provides Samantha a window to visualize the human world.

If you haven’t watched the movie “Her,” then you might be wondering about Samantha’s computer vision functionality.

There is a scene where Samantha navigates Theodore around a busy park filled with humans, with his eyes closed. You get the idea. Yes, it is beyond the typical object and facial recognition abilities of our today’s systems.

Samantha’s capability to navigate a human through a crowd with perfect precision is a technology we don’t currently have.

The closest variants of this technology, which come close to Samantha’s abilities, exist with Google Maps. Google Maps AR can offer real-time navigation.

However, you can only have about 10% of Samatha’s computer vision and navigation capacities when equipped with Google Voice Navigation. No offense to Google.

Artificial General Intelligence

Did we mention Samantha named herself?

Take a second to let that sink in.

So, we asked the Google Assistant to name herself, and she returned with some search results on “name yourself.” Not something we were expecting.

Then, we asked her, “What is your name?”, below are some snippets that describe the particular engagement.

Interaction between Author and the Google Assistant:

While Google Assistant can not name itself, it can still give you a bunch of nicknames. But, it is still not on a par with the ridiculous showcase of Samantha’s self-awareness.

AGI is an artificial intelligence that can perform general human-based tasks at a human’s performance level. Most AIs we have today are weak. They are merely trained for specific commands.

At the surface, we can perceive Samantha as a form of AGI. It can perform most human-based tasks, serving as a bridge between AGI and Super Artificial Intelligence.

Super AI is an intelligent system that can exceed human abilities in all conceivable tasks.

We may not develop some sort of Super AI anytime soon, but there is hope that we will create AGI in the upcoming decades.

The release of GPT-3 in 2020 and ongoing speculations about GPT-4 (8) are reigniting the conversations of AGI and how far off researchers are from creating an agent that can perform all human-based tasks at a decent performance level.

AGI: Are We Close?

As we discussed, Artificial intelligence is divided into three categories:

ANI, Artificial Narrow Intelligence, which has a low level of intelligence which can perform complex tasks but can’t understand why it is doing them, nor does it have any consciousness, e.g., Alexa, Siri.

AGI, Artificial General Intelligence, is conscious, self-aware, and knows that it is a machine and can think and process on the same level as humans. We don’t have AGI yet.

ASI, Artificial Super Intelligence, is a machine whose intelligence is superior to humans. Even though it does not exist yet, in theory, it would be able to solve problems beyond our understanding.

According to a survey of experts in the area, we have one in two chances to create AGI by 2040, increasing to nine to ten by 2075. The respondents also predicted that ASI would follow it by less than thirty years after that. There are also predictions that ASI would turn out to be “bad” or “extremely bad” for humanity to be one in three (9). Notably, the write-up should be relied on with caution as it does not refer to its experts and the criteria for being classed as an expert.

But, how close are we to creating an AGI? or Are we chasing an elusive objective that we may never accomplish?

Every day, we are barraged by news declaring the latest breakthrough in AI research. Companies such as OpenAI and DeepMind are working hard to make the existence of superintelligent machines a reality. At the same time, people like Elon Musk and Max Tegmark are worried about AI making humanity extinct (10, 11).

Even if we are not quite sure about us achieving AGI, we should spend effort researching. According to Dr. Ben Geortzel, the founder and CEO of SingularityNET Foundation (12, 13), we are at the turning point in the history of AI.

He believes that the balance of activities in the AI research area would shift from highly specialized narrow AI to AGI (14) over the upcoming few years.

“Any problem humanity faces, especially those hard ones like curing mental illness or curing death, building nanotechnology or Femto technology assemblers, saving planet Earth’s climate, or traveling to the distant stars can be solved effectively by first creating an AGI and then asking the AGI to solve those issues,” says Dr. Geortzel (15).

He believes that the push from narrow artificial intelligence towards AGI would need international effort.

According to Dr. Geortzel, the algorithms and structures are not the bottlenecks to realizing the AGI goals today, but it is more fundamental issues of funds and hardware.

Notably, there are also teams pursuing AGI with extensive amounts of hardware and funds, like Microsoft’s OpenAI (16) and Google DeepMind (17). However, Dr. Geortzel believes that they are mostly burning their resources at a rapid pace while pursuing intellectual dead ends.

While no one has the answer about if or when AGI will happen, there are those like Dr. Geortzel who are working to make it happen; nonetheless, even though maybe not in our lifetime, we still believe that we will be able to create strong AI someday.

Moreover, we may never know that we have accomplished this exceptional feat, either because the AI intelligence nature will be too different from ours or because the AI may be too smart to let us know. For all we know, it may have already happened.

Final Words

The movie “Her” does an excellent impression of an AI virtual assistant with ASI. Samantha is Grammarly, Google Search, Alarm, Gmail, Reminder, Calendar, and more, all packed in one super intelligent agent.

For Theodore, it was more than an assistant. It was a companion, lover, friend, therapist, and more.

Still, it is pretty difficult to imagine having a relationship with Alexa or Google Assistant beyond the typical “What is the weather like today?” conversations.

And even more obscure to imagine a human falling in love with a virtual assistant.

Yet, there is a massive amount of inspiration and information machine learning practitioners can draw from the AI capabilities conveyed with this interesting movie.

They can get an insight into how “Her” movie creators visualize the projection of AI applications in the future based on today’s technology. In turn, it can inspire ML researchers and practitioners to create novel applications and techniques.

The method is a closed inspirational loop that we have had for decades. Futuristic movies like “Her” fuel investors, researchers, scientists’ imaginations and present technology inspire movie writers and directors.

While we do not now know how much time it will take to develop the first real-world AGI application, one thing is for sure; its development will prompt a series of events and persistent changes, good or bad. It is set to reshape the world and life as we know it forever (18).

With that being said, we still don’t have commercial hoverboards.