SANS
GERARD
Google Developer Expert
Developer Evangelist

International Speaker
Spoken 201 times in 43 countries


Google AI Learning Path
Vertex AI
Complexity
Features



1
2
Opening the world

Gemini Ultra benchmarks
Gemini for Open Source: Gemma

Responsible AI

Reduce Biases
Safe
Accountable to people
Designed with Privacy
Scientific Excellence
Follow all Principles
Socially Beneficial
Not for Surveillance
Not Weaponised
Not Unlawful
Not Harmful


Global
First steps in Generative AI
Vertex AI
Complexity
Features



Imagine
Image Generation

Google Lens
Google Search
Learn


Listen Response
Share
+40 Languages
Be Creative

C++, Go, Java, Javascript, Python and Typescript
Code


Generate
complex graphics
Plot



Gemini extensions!


Access your GMail.
Do More

Deep integration with YouTube.
Save Time


Text Generation
Training: guessing the next word
Adjust model predictions using output
Wikipedia
Christopher is
Christopher Columbus was
Input
Output
Christopher Columbus discovered America
Christopher Columbus discovered America in
Christopher Columbus discovered America in 1492
Christopher Columbus discovered America in 1492 .
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492 .

Hyper-dimensional Graph

who

is



America
Columbus
discover


Latent Space

Word Embedding
Word Embedding
Latent Space Exploration

AI generated text is...
Biased
Non-factual
Inaccurate
1+1= 3
Grounding: reducing hallucinations
VectorDB Embeddings
Google Knowledge Graph
Google Search
Fact Checking
Stochastic parrot or AGI?
Common Myths for LLMs today
Antropomorphic
A program not a human.
Brain analogy is a myth. Very limited reasoning and planning.
Deterministic
Uncertain by design. Highly sensitive to inputs and learned patterns. Answers may change using an extra punctuation.
Prompt-driven
Output is dependent on training data. Prompts can't fix data issues or uneven distributions.
Neutral
Biased by training data. This is a blind spot. Difficult to find small mistakes.
Fact-retrieval
Not a database. Can't store or retrieve facts. Data cut-off. Not a search engine.
Responsible AI: confirmation bias

Perceived accuracy: 100%
Perceived accuracy: 85%
Real accuracy: 20%
Can be dangerous in high-stakes contexts. Eg: health or finance.
Can be used when errors have little to no consequences.
Eg: summarising or rephrasing.
Safe to use in general. Perfect accuracy.

Computer Vision
Computer Vision tasks

Source: V7 Labs
Visual Training Datasets
A dog running
Digits 0-9
Ant
French cat
MNIST
60K
10 classes
COCO
330K
80 classes
ImageNet
14M
Image + caption
LAION (Web)
5B
Image + text
Embedding high-dimensionality



A prompt will put you in a certain area of the latent space


Note the density of data points and noise ratios

Granular Visual Patches (ViT)



Query Patch
Detail
Image

Visual Attention Mechanism

"ginger fur"
"standing on a stone in the garden"
"well-groomed"
Image
Features
Attention
A new generalist Computer Vision
Visual Chat
VQA
Multi-turn
Reasoning
Extract Data
Handwriting
Data entry
OCR
Metadata
Identify
Recognition
Captioning
Categorising
Structure
Elements
Relationships
Hierarchies
Time/Space
Tracking
Activity
Causality
3D/4D


Multimodal: better understanding

Breaking language barriers

Advanced OCR: complex layouts

Emerging features: mirrored text

Computer Vision use-cases







Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00
Extract the text for the opening hours and consolidate them in a single paragraph in English
Prompt
Output
Multimodal example


HORARI DILLUNS A DIVENDRES DE 6'30H. A
Extract Text
OCR
Image Input
Task
Raw Data
13H 16'30H. A 20'00H
Extract Text
Hand-written OCR
Reason
Mash-up fragments
HORARI DILLUNS A DIVENDRES DE 6'30H. A 13H 16'30H. A 20'00H
Translate
Catalan to English
Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00
Image Generation

Imagen 2: unlocking image creativity






Generative AI for creatives

magazine style 4k photorealistic,
modern red armchair
natural lighting
Portrait of a french bulldog
at the beach,
85mm f/2.8

Assortment
of delicious,
freshly-baked donuts

Prompt
Imagen 2
Image inpainting and upscaling
Original + mask
Imagen 2


Automatic image captioning
Input
Caption


AI-driven interior design


AI-driven interior design


AI-driven interior design


AI-driven interior design


AI-driven interior design





GoogleAI
VertexAI

Pro 1.0
Pro 1.5

Ultra 1.0

Nano 1.0


New multi-modal architecture


1 Million tokens
Gemini 1.0 Pro 128K tokens
(100 pages)
Gemini 1.5 Pro 1M
(800 pages)



1h
video
11h
audio
30K
LOC
800
pages
1 Million tokens


Text: Apollo 11 script
402-pages

Video: Buster Keaton movie
44-minutes

Code: three.js demos
100K-LOC
A new age of AI-driven OSS

RAG vs Massive Context






The Power of Context
Self-contained
Only what you need. Avoids fragmentation and relevance drift
Cost-effective
Lower investment over time, fixed query latency. No fixed costs
User-guided
Unified data for focus, easier to control and improve quality
Interpretable
Improved transparency, easier to debug & iterate
Extreme In-Context-Learning
English to Kalamang



Limited
Generative AI for Developers
Vertex AI
Complexity
Features



Your sandbox for prompts








Try it Today!
Gemini models landscape

Create an API Key in Google AI Studio

Gemini API Quickstart








Introducing Google Gemini
By Gerard Sans
Introducing Google Gemini
In this session, you will learn about Google's Generative AI tools and ecosystem including: Google AI Studio and Gemini chatbot. Google AI Studio is a tool for Developers to build the new wave of Generative AI applications using Gemini foundational models. We will cover a general overview, run some demos using the just released Gemini 1.5 Pro model and answer your questions around the future of AI!
- 1,468