AI OS - Our new operating system
ChatGPT is just an ENIAC and a teletype - we can do so better than that
Picture by ThisIsEngineering
This installment of Clear Thinking is free for everyone. I send this email weekly. If you would also like to receive it, join other smart people who absolutely love it today.
👉 If you enjoy reading this post, feel free to share it with friends! Or feel free to click the ❤️ button on this post so more people can discover it on Substack 🙏
TLDR
AI will evolve over the next 1-3 years into a full-fledged operating system like MacOS or Windows perish the thought.
AI OS - Our new operating system
It is Tuesday, Day 11 of the war with Hamas.
I have a Tuesday deadline for my weekly newsletter - “Clear Thinking”.
I feel like Ernest Hemingway: "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."
It’s been 6 hours and all I have is a title and rough idea for an essay on tech that starts with a quote from Hemingway and ends with a quote from Isaac Babel.
What everyone is now calling AI, is based on a LLM - large language model.
A LLM like Open AI ChatGPT 4, or Anthropic Claude predicts the next phrase from a prompt of words.
Chat with a LLM went on to create an entirely new industry of so-called generative AI.
Generative AI generates new content (text, video, voice, pictures) from prompts.
Creative programmers using generative AI have created hundreds of applications for almost every conceivable use - from writing Twitter posts to generating pictures, videos, to automated code generation, sales call conversations and automated voice-overs.
There is an entire cottage industry of smart developers and people writing newsletters covering generative AI.
The speed of development of these new generative AI applications is astonishing. You can do sentiment analysis in 5 lines of Python code calling the GPT API in 10’ of work. 3 years ago - it would have taken a programmer 3 months of design and development.
The speed and ease of development with ChatGPT (or Chat for short) feel like a qualitative difference.
But Chat is just a baby step. Answers to prompts in a command-line chat.
How we took the baby step
ChatGPT 3/4 predicts the next token from a prompt of tokens
OpenAI's GPT-3 (Generative Pre-trained Transformer 3) was introduced in June 2020 through a research paper titled "Language Models are Few-Shot Learners." It quickly got significant attention due to its impressive capabilities and large size of 175 billion parameters.
To do that, ChatGPT 4 now uses 8 machine learning models with 220 billion parameters each, for a total of about 1.76 trillion parameters, connected by a Mixture of Experts (MoE).
"Mixture of Experts" (MoE) is a machine learning technique that partitions the input space into regions and assigns a specialized model (or "expert") to each region. The idea is to have each expert specialize in a particular subset of the data, thereby effectively handling the complexity and diversity of large datasets.
After getting the predictions from all the experts, the final output is a weighted combination of all the experts' predictions.
The number of machine learning models in GPT4 is like the number of cores in a CPU or gaming graphics card.
ChatGPT 4 has 8 models.
The GeForce RTX 4090 (a high-end graphics card for gaming that retails for $1495) has 16,384 cores.
Think about that for a moment.
Chat GPT 4 is just the beginning.
I believe that Chat will evolve into our next operating system over the next few years.
To understand how that will happen, let’s look at the Lego blocks of an operating system.
The Lego blocks of a modern operating system
Modern operating systems like MacOS or Windows have the same basic Lego blocks.
There are 5 kinds of OS Lego blocks: Kernel, Process management, Security and access control, system interface and user interface.
Let’s review each OS Lego block briefly:
Kernel: The core of the OS. It controls everything; memory, storage, devices, network, touch, motion detection, sound and video.
Process Management. Manages processes in the system; system processes that handle network communications and user applications like a browser.
Security and Access Control: Protect against malware and unauthorized access to the system. This includes user authentication, permissions, encryption, and auditing.
System Interface. The programming interface between the OS and user applications to access the system resources like storage, network or sound.
User Interface. The human user interface to the OS. It can be command-line based or graphical. The UI takes commands from the user and executes them.
The AI OS
Chat is a command-line user interface to the large language model. It is a remarkably capable and creative UI that works in many languages that often gets things wrong.
It is vulnerable to exploits and incredibly primitive compared to mobile phones.
Compared to a modern operating system or gaming card - it seems primitive. And it is.
It’s just a chat between a machine and a human.
Chat will evolve into a modern operating system model just like the ENIAC and a teletype evolved into MacOS running on Intel/Apple hardware with capabilities of touch, gesture detection, sound, and video.
There have been some startups working on smell generation but for some reason that never took off. Maybe the AI OS will be an opportunity to revisit smell detection and generation.
Imagine asking Chat - “how do I smell?” and the chat answers you - “Like an old sweatshirt”.
How Chat (or AI) will evolve into our next operating system.
There will be 5 kinds of AI OS building blocks: Kernel, Process management, Security and access control, system interface and user interface.
Let’s review each AI OS building block briefly:
Kernel: The core of the AI OS. It controls everything for the AI. Main AI memory, storage, devices, network, touch, motion detection, sound and video. The LLM is analogous to an Intel or Apple hardware CPU.
The kernel will be specifically designed for a particular LLM - like Open AI or Anthropic Claude, or Facebook LLAMA.
Like we have a MacOS running on M2 Apple hardware or Intel hardware - we will have an AI OS that runs on Anthropic or Facebook or Google or Open AI.
Due to the size and complexity of the LLM - the actual execution of the LLM will be in the cloud.
Instead of machine instructions, the kernel will send prompts to the LLM and return answers to the AI OS.
This kind of sounds like a Chrome book or the old IBM terminals. Not necessarily a bad thing not having to deal with the messy stuff of system updates and security.
Process Management. The AI OS manages processes that talk to the network, to storage systems and user applications like a browser or word processor.
If the AI wants to talk to someone or something on the Internet - it will have to go through the process manager to get that done.
The process manager enables the next layer - security and access control.
This could be a process manager that runs in the cloud or on Intel/Apple/ARM hardware.
Once you have the AI OS kernel - you can run the AI OS anywhere you like: in a browser, on a Chromebook or on Intel hardware.
Security and Access Control: Protects the AI OS from malware attacks, and unauthorized access to the system. This includes user authentication, permissions, encryption, and auditing just like in any modern OS.
I believe that the AI OS process manager and Security and Access control layer can resolve many of today’s concerns regarding exploits of the AI by bad guys.
System Interface. The programming interface between the AI OS and user applications to access the AI OS resources like storage, network or sound.
The AI OS system interface will enable the AI to see, hear, talk, play music, generate pictures and video and eventually smell.
User Interface. The human user interface to the AI OS. It can be command-line based or graphical. The UI takes commands from the user and executes them.
Just like Chat does today. You will be able to interact with the AI OS in your own language.
The AI OS will have an actual voice.
The tone of responses can be adjusted based on our instructions.
It will provide the right fit for older people, Gen Z people and code accessing the AI via an API.
With a deeper understanding of the needs of vision and hearing impaired people - the AI OS will provide the perfect UI for anyone since it has the ability to dip into the knowledge of the LLM.
And as I promised at the beginning of this essay - we’ll finish with Babel:
Isaac Babel: "We speak in different tongues, and yet we understand one another."