Header Ads

Google Gemini

Google Gemini, the latest multimodal AI solution from the Google team, has finally arrived.

First introduced at the Google I/O developer conference in May 2023, Google Gemini represents a crucial step forward in the brand’s artificial intelligence roadmap. It stems from the work of Google’s now-combined DeepMind and Brain AI labs, which joined forces on a new LLM journey.

Google Gemini

If nothing else, Google Gemini highlights Google’s ongoing quest to regain some AI market share from competitors like Meta and Microsoft as the demand for generative AI grows.

Here’s everything you need to know about Google Gemini and how to use it.

What is Google Gemini? The Basics
Google Gemini is a set of large language models (LLMs) that leverages training techniques from AlphaGo, such as tree search and reinforcement learning. It’s intended to become Google’s "flagship AI," powering many products and services within the Google portfolio.

According to CEO and Co-Founder of Google DeepMind, Demis Hassabis, Gemini is the most "capable" model they’ve ever built. It’s the result of significant collaborative efforts by multiple teams across Google and Google Research.

Unlike other models in the emerging LLM arms race, Google Gemini was built to be multimodal from the ground up. It can seamlessly generalize, understand, and combine different data types, such as text, code, audio, video, and images.

The solution was trained on Google’s in-house AI chips and tensor processing units, such as the TPU v4 and v5e. It’s one of the most flexible models on the market and one of the most efficient. Where other multimodal processes would need vast amounts of power, Gemini can run on everything from data centers to mobile devices.

What is Google Gemini Nano, Ultra, and Pro?
The version of Google Gemini released in December 2023 is just the first iteration of the model – labeled "Gemini 1.0". It has been optimized for three different "sizes":

Google Gemini Nano, Ultra, and Pro

Google Gemini Nano
Gemini Nano is the "lite" pared-down model of the LLM, available in two sizes: Nano-1 (1.8 billion parameters) and Nano-2 (3.25 billion parameters).

This version of Gemini is designed to run on mobile devices and will soon preview in Google’s AI Core app via Android 14 on the Pixel 8 Pro app. Though Nano is exclusive to the Pixel 8 Pro, for now, developers can apply for a sneak peek at the technology.

Nano will power various features previewed by Google during the Pixel 8 Pro unveiling in October, such as summarization within the Record app and suggested replies for messaging apps.

Google Gemini Pro
Google Gemini Pro runs on Google’s data centers and powers things like Google Bard, the chatbot similar to Microsoft’s Copilot solution. It will soon roll out into other Google tools, such as Duet AI, Google Chrome, Google Ads, and the Google Generative Search experience.

Google Gemini Pro will launch on December 13th for customers using Vertex AI (Google’s fully-managed machine learning platform). It will also be integrated into Google’s Generative AI developer suite going forward.

According to Google, Gemini Pro is more effective at tasks like brainstorming, writing, and summarizing content – outperforming OpenAI GPT-3.5 in six core benchmarks.

Google Gemini Ultra
Gemini Ultra, still unavailable for widespread use at this point, is the most capable model in the collection. Like Pro, it’s trained to be natively multimodal and was pre-trained and fine-tuned on various codebases.

Gemini Ultra can comprehend nuanced information in text, code, and audio and answer questions related to complicated topics. Ultra exceeds current state-of-the-art results on around 30 of the 32 widely-used benchmarks used for LLM development.

How Powerful is Google Gemini? Performance Insights
Ever since Google first announced the impending arrival of Gemini, analysts have been trying to predict just how powerful it could be. We finally have some genuine data shared by Google in the latest "Gemini Technical Report".

The AI team said they’ve been carefully testing their Gemini models for the last few months, evaluating their performance in various tasks. Although insights into the performance of Gemini Nano and Gemini Pro are limited, there’s plenty of data to suggest Ultra bulldoze LLM competitors.

With a score of around 90%, Gemini Ultra is the first solution capable of outperforming human experts in Massive Multitask Language Understanding (MMLU) tests. These tests use a combination of 57 different subjects, such as physics, math, history, and ethics, to examine real-world knowledge and problem-solving capabilities.

According to the team, Google’s new benchmark approach to MMLU means Gemini can use its reasoning abilities to “think more carefully” before it answers questions.

Gemini Ultra also achieved a state-of-the-art score of 59.4% on the new MMMU benchmark. This benchmark looks at the performance of LLMs on multimodal tasks that require deliberate reasoning.

Gemini Ultra Performs

Google says Gemini Ultra outperformed other leading models without assistance from object character recognition, highlighting the native multimodal capabilities of the solution.

This doesn’t necessarily mean Google Gemini won’t suffer from the same issues other language models face, such as AI hallucination. Even the best generative AI models can respond problematically when prompted in specific ways.
Powered by Blogger.