JHN

May 10, 20238 min read

Google I/O 2023: AI at The Wheel

Google in their Google I/O 2023 has just announced some exciting new features for its search engine, powered by two cutting-edge AI technologies: Gemini and Bard. They also gives major update on other AI-related products, such as TPU, Pixel phones (yes, it is an AI-powered phone), and Cloud-AI-as-a-Service. In this blog post, we will explore what these technologies are, how they work, and what it means for users, business and content creators.

Picture: Google. Source: Unsplash

Gemini: A Multi-Model API for Natural Language and Vision

Gemini is a multi-model AI that includes API that allows users to interact with Google's vast knowledge base using natural language and vision. Gemini can understand complex queries that involve text, images, or both, and provide relevant answers or suggestions. For example, you can ask Gemini to show you pictures of cats wearing hats, or to summarize the main points of a long article. Basically, you can understand it as a Google version of GPT-4 multi-model AI.

The ultimate goal of Gemini is to integrate it into Google Search, the core product of Google. This means that you will be able to use Gemini's multi-model API to search for anything you want using natural language questions or commands. Gemini will provide you with an interactive AI summary on top of the search page, which will give you the most relevant information from various sources. Gemini will also provide you with citations and links to the original pages for further exploration.

Bard: A Code Generator and Explainer

Bard is Google chatbot version of PaLM. Starting from today, Bard is upgraded to PaLM2, which adds additional features likes code generator, code explainer and optimizer in 20+ programming languages. Bard can also explain how the code works using natural language and diagrams, or creating workflow diagram or tables through other Google products. After today, Bard is now officially open to 180+ countries and can handle queries in multiple languages. It is also directly integrated into Google Search, providing an interactive AI summary on top of the search page.

Since Bard is based on PaLM2, it is now able but not limited to:

1) Generate images from text or text from images, using a technique called watermarking and metadata image generation. This helps regular users to identify AI-generated photos and distinguish them from real ones. For content creators or business, although it is not clear regarding the copyright of AI-generated image, the workflow and budget should be improved significantly.

2) Provide citations and links to the sources of its answers, using a technique called code citation. This helps users to verify the information and learn more about the topic. Furthermore, it can help solve the issue regarding summary content reducing network traffic in-flowing to content producers' websites.

3) Integrate with other Google products, such as Gmail, Maps, Sheets, Docs, Calendar, etc. This allows users to access Bard's capabilities from different platforms and applications in a two-way direction.

4) Partner with third-party services, such as Adobe Firefly, Insta Cart, etc. This allows users to leverage Bard's features for various tasks and domains. This is a direct competition to OpenAI's extension marketplace. For example, Adobe Firefly is a partner that uses Bard to create and edit graphics and animations using natural language commands. Insta Cart is another partner that uses Bard to optimize its delivery routes and schedules using natural language queries. These partnerships show that Bard is a competitive edge for Google in the AI market.

5) Generate code from natural language descriptions or pseudocode, using a technique called code synthesis. This helps users to quickly create working programs without writing any code.

6) Optimize code for performance, readability, or style, using a technique called code refactoring. This helps users to improve their existing code and follow best practices.

7) Explain code using natural language and diagrams, using a technique called code summarization. This helps users to understand how the code works and what it does.

8) Export code to third-party, likes Replit, a cloud-based IDE that allows users to run and debug their code online. This helps users to test their code and share it with others. If you know AutoGPT, this exposure hints a future Bard-version of auto code generation and debugging workflow.

Picture: At your service. Source: Unsplash

As mentioned above, Bard is not only a standalone feature, but also integrated into various Google products, such as Gmail, Maps, Sheets, Docs, Calendar, etc. This means that you can use Bard to write code directly within these products, without switching to another platform or tool. For example, you can use Bard to write a script that automates a task in Google Sheets, or to create a custom function in Google Docs. You can also create a document or presentation from the content requirement of an email. For instance, as a writer, you can quickly edit your draft after receiving a feedback email from your editor.

Bard+ Tools: A New Way to Search with NLP and Vision

Bard+ Tools are a set of extensions that enhance Bard's capabilities with Gemini's multi-model API. With Bard+ Tools, users can search not only with natural language but also with relevant images. For example, you can use Bard+ Tools to:

- Generate music from text or images, using another machine learning model called MusicLM and potentially being upgraded with AudioLM in the future. This helps users to create original songs or melodies based on their preferences or mood.

- Generate image captions or descriptions from images or text, using a technique called image captioning. This helps users to add context or meaning to their photos or illustrations.

They can be accessed from Google Search or Google Lens, providing a new way to explore information and generate content.

Picture: Multi Functional Robot. Source: Wix

Bard also comes with a set of tools that enhance your search experience with NLP and images. For example, you can use Bard+ Images to search for images based on natural language descriptions or keywords. Bard will generate relevant images that match your query, as well as provide captions and metadata for each image. You can also use Bard+ Music to search for music based on natural language descriptions or keywords. Bard will generate music tracks that match your query, as well as provide lyrics and metadata for each track.

Hardware: TPUs/GPUs

One of the key challenges of AI is the computational complexity and cost of training and running large and complex neural network models. To address this challenge, Google has developed its own custom-designed hardware accelerators called Tensor Processing Units (TPUs), which are optimized for performing large matrix operations often found in machine learning algorithms, specifically Tensor data type.

Google uses TPUs internally to power many of its AI services, such as Pixel phone, Search, YouTube, and LaMDA. Google also makes TPUs available for third-party use through its cloud platform, offering different generations and configurations of TPUs depending on the workload and budget.

Besides TPUs, Google also leverages GPUs for some of its AI workloads, especially those that require high memory bandwidth or low latency. For example, Google offers A2 VMs based on NVIDIA A100 GPUs, which are ideal for data analytics, machine learning training and inference, high-performance computing, and graphics applications. Google also collaborates with NVIDIA to optimize its software stack for NVIDIA GPUs, such as TensorFlow XLA compiler.

Cloud: AI as a Service

Picture: Cloud as a Service. Source: Wix

Another challenge of building AI-related product is the high cost and the high requirement for expert knowledge. To overcome this challenge, Google provides various cloud-based AI services that allow users to access state-of-the-art AI models without having to build their own infrastructure or expertise.

One of the most impressive examples of Google's cloud AI services is the availability to train personalized PaLM (Pre-trained Language Model) based on clients' needs. PaLM is trained on a massive corpus of text data from various sources, such as Wikipedia, books, news articles, web pages, etc., using self-supervised learning techniques. PaLM can also be fine-tuned on specific domains or tasks using supervised learning techniques.

PaLM is powered by Google's TPUs and can handle billions of parameters and trillions of operations per second. PaLM is also accessible through Google's cloud platform as an API service that can be integrated with various applications and platforms. For example, PaLM can be used to create personalized AI assistants that can help users with various tasks, such as booking flights, ordering food, or finding information.

Another example of Google's cloud AI services is Project Tailwind, which is an AI notebook integrated with Google Drive that can help users with their research and study needs. Project Tailwind can analyze the user's documents and queries and generate relevant sources, summaries, key information, citations, etc., using PaLM and other AI models. Project Tailwind can also create personalized study guides and quizzes based on the user's goals and preferences. Project Tailwind aims to make research and learning faster and easier for everyone.

Android: AI on Mobile Devices

Picture: Android. Source: Unsplash

Google also brings its AI innovations to mobile devices through its Android operating system and Pixel devices. Android offers various AI features that enhance the user experience and functionality of mobile applications. Here are some of the highlights:

AI generation products integrated into Android

One of the AI features that Android offers is Live Wallpaper Creator, which allows users to create custom wallpapers for their devices using AI models that can generate realistic images based on user inputs. Users can choose from different themes, such as landscapes, animals.

Android users can now enjoy a variety of AI generation products that can create stunning wallpapers, color palettes, and image depth effects based on their preferences and inputs. For example, StarryAI is an app that can generate AI images based on complicated prompts, such as "a fantasy castle in the clouds". Now, users can directly do it through Google's own native app.

Pixel Speech: Auto translator from recording audio LIVE

Pixel Speech is a new feature that allows users to translate audio recordings from any language to any other language in real time. Users can simply record a conversation, a lecture, a podcast, or any other audio source and get an instant translation in their preferred language. Pixel Speech also supports offline mode, so users can translate audio without an internet connection.

Pixel 7a: Price at $499 USD

The Pixel 7a is the latest smartphone from Android, featuring a powerful G2 chip that delivers faster performance and better battery life than the previous G1 chip. The Pixel 7a is priced at $499 USD, which is the same as the Pixel Tablet that comes with a free Speaker Dock.

Pixel Fold: thinnest foldable on the market

The Pixel Fold is Google's first foldable smartphone, boasting a sleek design and a large screen. The Pixel Fold has a 7.6-inch OLED display when fully open. The Pixel Fold also features a G2 chip. It is the thinnest foldable on the market, measuring only 7.6mm when unfolded and 15.2mm when folded, which is only slightly bigger than the average phones.

The Pixel Fold also supports a dual screen interpreter mode that can help users communicate across languages. Users can choose one language to display on one screen and another language to display on the other big screen. The interpreter mode can then translate speech or text from one language to another and display it on the corresponding screen. This feature is unique to the Pixel Fold so far and can be useful for travelers, business people, or anyone who wants to learn a new language.

One of the drawbacks of the Pixel Fold is that the hinge can be seen on the big foldable screen when it is open. This may affect the visual quality and durability of the screen over time. However, Android claims that the hinge is designed to be flexible and resilient, and that it will not interfere with the user experience.

The Pixel Fold is not cheap, starting at $1799 USD for the base model with 8GB of RAM and 128GB of storage. However, Android is offering a special deal for early adopters: anyone who pre-orders the Pixel Fold will get a free Pixel watch as well. The Pixel watch is Android's first smartwatch, featuring a circular OLED display, a heart rate monitor, GPS, NFC, and wireless charging. The Pixel watch can also sync with the Pixel Fold and other Android devices via Bluetooth.