- Tech Tales & Tactics
- Posts
- Unlocking Opportunities: Summarizing the Latest AI Announcements and Their Impact
Unlocking Opportunities: Summarizing the Latest AI Announcements and Their Impact
Key takeaways and AI opportunities.
Good morning. Did you read and digest all the AI news from last week? It was a big week. If not, I can’t blame you. If you want to be informed on a high level, it's a good thing you opened this email because today’s newsletter has a recap of the most impactful updates and highlights for you. You can skip that section if you have already found them all.
In our “key takeaways” section, we’ll observe how these announcements impact the AI economy and how value can still be captured from smart, fast-moving entrepreneurs like us.
5-minute-read
Best of OpenAI’s announcements
Since Sam Altman doesn’t “care about the competition,” it’s an interesting coincidence that he frequently picks OpenAIs updates to come out right before Google’s updates. Or is it…
GPT-4”o” stands for omni—a multimodal model that can handle video, images, and audio—everything you can throw around on the internet.
GPT-4o, including the GPT Store, Vision Models, and Memory, is now free to use to some extend.
GPT-4o does not offer unlimited use, you get a short taste. After a few chats, it falls back to GPT-3.5 until it resets a few hours later. It’s an interesting way to tease users and will likely see a higher conversion to the paid tier.
Developer APIs have also gotten faster AND cheaper. If that’s not fueled by a competitive spirit, I don’t know what is.
Scarlett Johansson did not like it…
The conversational aspect of the model got the most attention. What we saw was nothing short of the vision of the 2013 film “Her.” The voice even reminded people of Scarlett Johansson—so much so that on Monday, her legal team got involved and started to ask questions. Apparently, Sam Altman reached out to her 2 days before the presentation and hadn’t heard back.
Then, they showcased a live demo in which an OpenAI widget could see your computer screen if you let it. What it can do there is like a true copilot: help you understand or debug code, interpret data on graphs, solve puzzles, and more.
They also showed how it can analyze and interpret your facial expressions.
Short presentation, long list of examples to explore on your own
The keynote was rather short and let the users explore a lot more on their own—for the curious; they prepared a treasure trove of examples on their website.
Play Rock Paper Scissors
It helps you prepare for an interview
Create characters and reuse them in different setups—I can finally start to write a comic book for my daughter.
Math examples that can be solved by looking at the problem on paper
Create an entire font based on your description
What we can learn from how they announced it
The range of applications is so broad that it is impossible to show them all. But they want to inspire and engage as many use cases as possible.
These examples are strategically laid out to address various examples and get as many users involved as possible. If it’s unclear why, we’ll examine that in the conclusion.
Best of Google’s AI Announcements
Google’s presentation took about two hours, and everything was surrounded by demos, slides, and presentations. On top of that, they published one big blog post with “100 things we announced at I/O 2024.” Listening to the article would take 21 minutes. I could never do that to you, so here are a few highlights.
They improved Gemini to have a much larger context window, which should put them at the top of the list compared to other models.
Explain “context window” like I am 5! (that’s how I learned it)
If you read a lot of books, you become really smart. The more you read, the smarter you get and the more things you can understand. You not only have more answers, you also understand more questions. If someone asks a tricky question, you should better understand that question because you have learned so much more about the world.
Comparison of AI context windows
Project Astra stole the show
If you haven’t seen this demo video, you have to watch it:
In case you still haven’t watched, they take a mobile phone around the room. The phone can see and understand what it is looking at. It can solve problems on whiteboards and it can remember what it has seen as well.
What many people missed: Mid-video, the person put on glasses. She then continued the demo by USING THE GLASSES. Google has always been passionate about solving this form factor, and I bet a new launch of Google Glasses is not far out.
Google Project Astra Glasses Demo
Other Noteworthy Demos
NotebookLM — A tool to drop all kinds of written and recorded notes into. Then NotebookLM creates some form of verbal discussion. It can create something like a personalized podcast for you.
Media Tools — From text-to-image tools for Musicians and video generation capabilities, a very similar set of applications we saw from OpenAI and almost on the levels of more established tools like Midjourney and Sora (OpenAI’s text-to-video)
Google Search will take in a lot longer questions, chained together with more detailed asks, and give summarized answers
A Gmail Assistant that can catch you up on emails about a specific subject: this Gemini sidebar will be available for the entire Google Suite of tools.
What did we notice?
Until a few months ago, applications that addressed these exact use cases popped up everywhere. But with these demos, one thing became clear: OpenAI and Google, building the Foundational Models, also want to own the application layer and directly access users.
OpenAI vs Google, who “won” last week?
Google has an immense advantage—it already owns users' productivity interfaces. All It needs to do is expand its offering and incorporate AI into users' existing workflows, and that’s exactly what it is doing.
Google’s goal for users (as I see it) is that no user should have to create an OpenAI (or any other provider) account for all the AI assistance they need.
OpenAI’s challenge is finding a large partner like Apple to integrate with daily-used interfaces. Could it replace Siri?
Or partner with other large enterprises that provide these commonly used workflow tools, like Microsoft, which they did. Once any other model runs on par with OpenAI’s, why would we keep a second browser window open just for ChatGPT?
Aside from that, OpenAI seems also to have a few internal challenges to solve.
Ilya Sutskever, one of the co-founders and CTO, announced his departure from OpenAI. He was involved in the development from the beginning, so seeing him depart will shake things up.
Key Takeaways and Opportunities Ahead
Big tech behemoths emerge in the AI value chain and want to take more of the cake. Multimodal foundational layers kill startups that want to build a business around text-to-image, voice-to-text, text-to-video, etc.
Building out and showcasing a wide range of demos about solutions to real-world problems also gives OpenAI and Google direct access to the users.
And any of the small xyz.ai apps that have been cooking over the last two years will have to fight a hard battle to stay relevant. See the application layer in the graphic below; this is what Google and OpenAI will take over.
Foundational and application layer becoming one?
What opportunities still exist or opened up?
Specialized areas — Highly specialized fields that work on niche fields with limited public data. Fields like medical research, proprietary industrial processes, and nascent technologies. Aggregating specialized data and training GPTs can help speed up workflows. BUT first, one needs access to that data and likely do much data cleanup work.
Current Events and Real-Time Information — Current AI models are not updated in real-time. GPT-4 Turbo has a cut-off in April 2023. So, a lot can be done by feeding it more relevant data and drawing new conclusions. Not sure how cost-effective it can be done
Non-Digital Data — Some data is inaccessible without tracking and digitizing it first. There are processes in the world that are purely manual, like production processes, logistics, material flows, etc. IoT solutions are out there, but they are still catching up, and large areas and industries are up for the taking.
Small business documents — Often still in physical ledgers or not in the cloud. Digitizing these and helping these businesses draw insights from them can be a big value add.
Consulting — Offering specialized services tailored around business-specific challenges. You can show them use cases as examples and realize them internally by leveraging internal data and a platform that protects their data. Data cleanup here is another opportunity that remains.
Localized data — Not just translations, although that is not solved yet. All the coolest features are supported in English. There is a huge opportunity to build out proper translation and localization. But also highly localized data from, for example, local newspapers, their archives, and bulletin boards.
Historical archives in the Library of Minas Tirith could be an interesting dataset.
Not in GPT or Gemini yet - the library of Minas Tirith
General need for education — the need for services around specialized knowledge on how to leverage Gen AI, how to keep up, and what can be done with it, is still growing fast.
Google Trends data on “Generative AI”
Whew… it’s quite a rush trying to keep up with the world. Good on you for staying informed, and thanks a lot for reading. And don’t hesitate to pass this on to a friend who might also be interested.
Have a great rest of the week,
Reply