SANS
GERARD
Google Developer Expert
Developer Evangelist

International Speaker
Spoken 199 times in 43 countries


Google AI Learning Path
Vertex AI
Complexity
Features



1
2
Google AI Ecosystem
VertexAI
AI Platform
Gemini for Workspace
AI Assistant



AI Studio
AI Playground
Gemini
Foundational
Models


Gemini
Chatbot
Specialised training

Multimodal medical model

Med-Palm
Opening the world

Gemini Ultra benchmarks
Gemini for Open Source: Gemma

Responsible AI

Reduce Biases
Safe
Accountable to people
Designed with Privacy
Scientific Excellence
Follow all Principles
Socially Beneficial
Not for Surveillance
Not Weaponised
Not Unlawful
Not Harmful

Vertex AI
Global
Google Cloud AI Platform
Vertex AI
Complexity
Features



Scaling Generative AI

Foundational Models
Voice
text-speech speech-text
Medical
medlm-medium medlm-large
Code
code-bison codechat-bison code-gecko

Multimodal
gemini-pro gemini-pro-vision
gemini-ultra*
1 Million tokens
Gemini 1.0 Pro 128K tokens
(100 pages)
Gemini 1.5 Pro 1M
(800 pages)


Gemini for Google Workspace

Protection: IP infringements

All Google Services
DuetAI
Generated Outputs
VertexAI
Training Data
Stochastic parrot or AGI?
Training: guessing the next word
Adjust model predictions using output
Wikipedia
Christopher is
Christopher Columbus was
Input
Output
Christopher Columbus discovered America
Christopher Columbus discovered America in
Christopher Columbus discovered America in 1492
Christopher Columbus discovered America in 1492 .
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492 .

Hyper-dimensional Graph

who

is



America
Columbus
discover


Latent Space

Word Embedding
Word Embedding


A prompt will put you in a certain area of the latent space


Note the density of data points and noise ratios

AI generated text is...
Biased
Non-factual
Inaccurate
1+1= 3
Grounding: reducing hallucinations
VectorDB Embeddings
Google Knowledge Graph
Google Search
Fact Checking
From idea to code

AI Studio
AI Playground
Gemini
API

Gemini
Fine-tuning


Digital art for everyone

Imagen 2: unlocking visual creativity






Generative AI for creatives

magazine style 4k photorealistic,
modern red armchair
natural lighting
Portrait of a french bulldog
at the beach,
85mm f/2.8

Assortment
of delicious,
freshly-baked donuts

Prompt
Imagen 2
Image inpainting and upscaling
Original + mask
Imagen 2


Automatic image captioning
Input
Caption


Explore images via chat with VQA
Input
Question


AI-driven interior design


AI-driven interior design


AI-driven interior design


AI-driven interior design


AI-driven interior design




Global
First steps in Generative AI
Vertex AI
Complexity
Features




Paid access for Gemini Ultra
Imagine
Image Generation


Google Lens
Google Search
Learn
Listen Response
Share
+40 Languages
Be Creative

C++, Go, Java, Javascript, Python and Typescript
Code


Generate
complex graphics
Plot



Gemini extensions!


Access your GMail.
Do More

Deep integration with YouTube.
Save Time




US-only
Generative AI for Developers
Vertex AI
Complexity
Features



Your sandbox for prompts









GoogleAI
VertexAI

Pro 1.0
Pro 1.5

Ultra 1.0

Nano 1.0


1h
video
11h
audio
30K
LOC
800
pages
1 Million tokens

Foundational Models
Embeddings
models/embedding-001

Gemini Pro
gemini-pro
Gemini Pro Vision
gemini-pro-vision


New multi-modal architecture


Computer Vision tasks

Source: V7 Labs
Visual Training Datasets
A dog running
Digits 0-9
Ant
French cat
MNIST
60K
10 classes
COCO
330K
80 classes
ImageNet
14M
Image + caption
LAION (Web)
5B
Image + text
Granular Visual Patches (ViT)



Query Patch
Detail
Image

Visual Attention Mechanism

"ginger fur"
"standing on a stone in the garden"
"well-groomed"
Image
Features
Attention
A new generalist Computer Vision
Visual Chat
VQA
Multi-turn
Reasoning
Extract Data
Handwriting
Data entry
OCR
Metadata
Identify
Recognition
Captioning
Categorising
Structure
Elements
Relationships
Hierarchies
Time/Space
Tracking
Activity
Causality
3D/4D


Computer Vision use-cases

Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00
Extract the text for the opening hours and consolidate them in a single paragraph in English
Prompt
Output
Multimodal example


HORARI DILLUNS A DIVENDRES DE 6'30H. A
Extract Text
OCR
Image Input
Task
Raw Data
13H 16'30H. A 20'00H
Extract Text
Hand-written OCR
Reason
Mash-up fragments
HORARI DILLUNS A DIVENDRES DE 6'30H. A 13H 16'30H. A 20'00H
Translate
Catalan to English
Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00

Vision Examples
Multimodal: better understanding

Breaking language barriers

Advanced OCR: complex layouts

Emerging features: mirrored text

Gemini models landscape

Access to Gemini

Mini-Gemini Chatbot Demo









Building a mini-Gemini Chatbot
By Gerard Sans
Building a mini-Gemini Chatbot
In this talk, you will learn how to build a mini-Gemini Chatbot using Google's latest Generative AI using Google AI Studio, Gemini Pro model and Angular. Google AI Studio is a tool to build the new wave of Generative AI applications using Gemini foundational models. We will be introducing the Gemini models to build the foundations of a Gemini Chatbot and explore advanced features like AI Agents, the ability to use tools and call APIs; RAG, or Retrieval Augmented Generation to improve grounding and extend Gemini training data cut off to include external data and more! Google Gemini era is here.
- 1,594