SANS
GERARD
Google Developer Expert
Developer Evangelist
International Speaker
Spoken 201 times in 43 countries
Google AI Learning Path
Vertex AI
Complexity
Features
1
2
Opening the world
Gemini Ultra benchmarks
Gemini for Open Source: Gemma
Responsible AI
Reduce Biases
Safe
Accountable to people
Designed with Privacy
Scientific Excellence
Follow all Principles
Socially Beneficial
Not for Surveillance
Not Weaponised
Not Unlawful
Not Harmful
Global
First steps in Generative AI
Vertex AI
Complexity
Features
Imagine
Image Generation
Google Lens
Google Search
Learn
Listen Response
Share
+40 Languages
Be Creative
C++, Go, Java, Javascript, Python and Typescript
Code
Generate
complex graphics
Plot
Gemini extensions!
Access your GMail.
Do More
Deep integration with YouTube.
Save Time
Text Generation
Training: guessing the next word
Adjust model predictions using output
Wikipedia
Christopher  is
Christopher Columbus was
Input
Output
Christopher Columbus discovered  America
Christopher Columbus discovered America  in
Christopher Columbus discovered America in  1492
Christopher Columbus discovered America in 1492 Â .
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492.
Christopher Columbus discovered America in 1492 .
Hyper-dimensional Graph
who
is
America
Columbus
discover
Latent Space
Word Embedding
Word Embedding
Latent Space Exploration
AI generated text is...
Biased
Non-factual
Inaccurate
1+1= 3
Grounding: reducing hallucinations
VectorDB Embeddings
Google Knowledge Graph
Google Search
Fact Checking
Stochastic parrot or AGI?
Common Myths for LLMs today
Antropomorphic
A program not a human.
Brain analogy is a myth. Very limited reasoning and planning.
Deterministic
Uncertain by design. Highly sensitive to inputs and learned patterns. Answers may change using an extra punctuation.
Prompt-driven
Output is dependent on training data. Prompts can't fix data issues or uneven distributions.
Neutral
Biased by training data. This is a blind spot. Difficult to find small mistakes.
Fact-retrieval
Not a database. Can't store or retrieve facts. Data cut-off. Not a search engine.
Responsible AI: confirmation bias
Perceived accuracy: 100%
Perceived accuracy: 85%
Real accuracy: 20%
Can be dangerous in high-stakes contexts. Eg: health or finance.
Can be used when errors have little to no consequences.
​Eg: summarising or rephrasing.
Safe to use in general. Perfect accuracy.
Computer Vision
Computer Vision tasks
Source: V7 Labs
Visual Training Datasets
A dog running
Digits 0-9
Ant
French cat
MNIST
60K
10 classes
COCO
330K
80 classes
ImageNet
14MÂ
Image + caption
LAION (Web)
5B
Image + text
Embedding high-dimensionality
A prompt will put you in a certain area of the latent space
Note the density of data points and noise ratios
Granular Visual Patches (ViT)
Query Patch
Detail
Image
Visual Attention Mechanism
"ginger fur"
"standing on a stone in the garden"
"well-groomed"
Image
Features
Attention
A new generalist Computer Vision
Visual Chat
VQA
Multi-turn
Reasoning
Extract Data
Handwriting
Data entry
OCR
Metadata
Identify
Recognition
Captioning
Categorising
Structure
Elements
Relationships
Hierarchies
Time/Space
Tracking
Activity
Causality
3D/4D
Multimodal: better understanding
Breaking language barriers
Advanced OCR: complex layouts
Emerging features: mirrored text
Computer Vision use-cases
Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00
Extract the text for the opening hours and consolidate them in a single paragraph in English
Prompt
Output
Multimodal example
HORARI DILLUNS A DIVENDRES DE 6'30H. A
Extract Text
OCR
Image Input
Task
Raw Data
13H 16'30H. A 20'00H
Extract Text
Hand-written OCR
Reason
Mash-up fragments
HORARI DILLUNS A DIVENDRES DE 6'30H. A 13H 16'30H. A 20'00H
Translate
Catalan to English
Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00
Image Generation
Imagen 2: unlocking image creativity
Generative AI for creatives
magazine style 4k photorealistic,
modern red armchair
natural lighting
Portrait of a french bulldog
at the beach,
85mm f/2.8
Assortment
of delicious,
freshly-baked donuts
Prompt
Imagen 2
Image inpainting and upscaling
Original + mask
Imagen 2
Automatic image captioning
Input
Caption
AI-driven interior design
AI-driven interior design
AI-driven interior design
AI-driven interior design
AI-driven interior design
GoogleAI
VertexAI
Pro 1.0
Pro 1.5
Ultra 1.0
Nano 1.0
New multi-modal architecture
1 Million tokens
Gemini 1.0 Pro 128K tokens
(100 pages)
Gemini 1.5 Pro 1M
(800 pages)
1h
video
11h
audio
30K
LOC
800
pages
1 Million tokens
Text: Apollo 11 script
402-pages
Video: Buster Keaton movie
44-minutes
Code: three.js demos
100K-LOC
A new age of AI-driven OSS
RAG vs Massive Context
The Power of Context
Self-contained
Only what you need. Avoids fragmentation and relevance drift
Cost-effective
​Lower investment over time, fixed query latency. No fixed costs
User-guided
Unified data for focus, easier to control and improve quality
Interpretable
Improved transparency, easier to debug & iterate
Extreme In-Context-Learning
English to Kalamang
Limited
Generative AI for Developers
Vertex AI
Complexity
Features
Your sandbox for prompts
Try it Today!
Gemini models landscape
Create an API Key in Google AI Studio
Gemini API Quickstart
Introducing Google Gemini
By Gerard Sans
Introducing Google Gemini
In this session, you will learn about Google's Generative AI tools and ecosystem including: Google AI Studio and Gemini chatbot. Google AI Studio is a tool for Developers to build the new wave of Generative AI applications using Gemini foundational models. We will cover a general overview, run some demos using the just released Gemini 1.5 Pro model and answer your questions around the future of AI!
- 839