SANS

GERARD

Google Developer Expert

Developer Evangelist

International Speaker

Spoken 199 times in 43 countries

Google AI Learning Path

Vertex AI

Complexity

Features

1

2

Google AI Ecosystem

VertexAI

AI Platform

Gemini for Workspace

AI Assistant

AI Studio

AI Playground

Gemini

Foundational

Models

Gemini

Chatbot

Specialised training

Multimodal medical model

Med-Palm

Opening the world

Gemini Ultra benchmarks

Gemini for Open Source: Gemma

Responsible AI

Reduce Biases

Safe

Accountable to people

Designed with Privacy

Scientific Excellence

Follow all Principles

Socially Beneficial

Not for Surveillance

Not Weaponised

Not Unlawful

Not Harmful

Vertex AI

Global

Google Cloud AI Platform

Vertex AI

Complexity

Features

Scaling Generative AI

Foundational Models

Voice

text-speech speech-text

Medical

medlm-medium medlm-large

Code

code-bison codechat-bison code-gecko

Multimodal

gemini-pro gemini-pro-vision

gemini-ultra*

1 Million tokens

Gemini 1.0 Pro 128K tokens

(100 pages)

Gemini 1.5 Pro 1M

(800 pages)

Gemini for Google Workspace

Protection: IP infringements

All Google Services

DuetAI

Generated Outputs

VertexAI

Training Data

Stochastic parrot or AGI?

Training: guessing the next word

Adjust model predictions using output

Wikipedia

Christopher  is

Christopher Columbus was

Input

Output

Christopher Columbus discovered  America

Christopher Columbus discovered America  in

Christopher Columbus discovered America in  1492

Christopher Columbus discovered America in 1492  .

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492 .

Hyper-dimensional Graph

who

is

America

Columbus

discover

d_4
d_{2}
d_5
d_{1408}
d_{1}
d_{3}

Latent Space

Word Embedding

d_{1}
d_{1408}
Columbus
0.3
1.2
-1
0.9

Word Embedding

d_{1}
d_{1408}
America
1.5
0.3
0.7
0.1

A prompt will put you in a certain area of the latent space

Note the density of data points and noise ratios

AI generated text is...

Biased

Non-factual

Inaccurate

1+1= 3

Grounding: reducing hallucinations

VectorDB Embeddings

Google Knowledge Graph

Google Search

Fact Checking

From idea to code

AI Studio

AI Playground

Gemini

API

Gemini

Fine-tuning

Digital art for everyone

Imagen 2: unlocking visual creativity

Generative AI for creatives

magazine style 4k photorealistic,

modern red armchair

natural lighting

Portrait of a french bulldog

at the beach,

85mm f/2.8

Assortment

of delicious,

freshly-baked donuts

Prompt

Imagen 2

Image inpainting and upscaling

Original + mask

Imagen 2

Automatic image captioning

Input

Caption

Explore images via chat with VQA

Input

Question

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

Global

First steps in Generative AI

Vertex AI

Complexity

Features

Paid access for Gemini Ultra

Imagine

Image Generation

Google Lens

Google Search

Learn

Listen Response

Share

+40 Languages

Be Creative

C++, Go, Java, Javascript, Python and Typescript

Code

Generate

complex graphics

Plot

Gemini extensions!

Access your GMail.

Do More

Deep integration with YouTube.

Save Time

 US-only

Generative AI for Developers

Vertex AI

Complexity

Features

Your sandbox for prompts

GoogleAI

VertexAI

Pro 1.0

Pro 1.5

Ultra 1.0

Nano 1.0

1h

video

11h

audio

30K

LOC

800

pages

1 Million tokens

Foundational Models

Embeddings

models/embedding-001

Gemini Pro

gemini-pro

Gemini Pro Vision

gemini-pro-vision

New multi-modal architecture

Computer Vision tasks

Source: V7 Labs

Visual Training Datasets

A dog running

Digits 0-9

Ant

French cat

MNIST

60K

10 classes

COCO

330K

80 classes

ImageNet

14M 

Image + caption

LAION (Web)

5B

Image + text

Granular Visual Patches (ViT)

Query Patch

Detail

Image

Visual Attention Mechanism

"ginger fur"

"standing on a stone in the garden"

"well-groomed"

Image

Features

Attention

A new generalist Computer Vision

Visual Chat

VQA

Multi-turn

Reasoning

Extract Data

Handwriting

Data entry

OCR

Metadata

Identify

Recognition

Captioning

Categorising

Structure

Elements

Relationships

Hierarchies

Time/Space

Tracking

Activity

Causality

3D/4D

Computer Vision use-cases

Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00
Extract the text for the opening hours and consolidate them in a single paragraph in English

Prompt

Output

Multimodal example

HORARI
DILLUNS A DIVENDRES
DE 6'30H. A

Extract Text

OCR

Image Input

Task

Raw Data

13H
16'30H. A 20'00H

Extract Text

Hand-written OCR

Reason

Mash-up fragments

HORARI
DILLUNS A DIVENDRES
DE 6'30H. A 13H 16'30H. A 20'00H

Translate

Catalan to English

Monday to Friday
from 6:30 to 13:00 from 16:30 to 20:00

Vision Examples

Multimodal: better understanding

Breaking language barriers

Advanced OCR: complex layouts

Emerging features: mirrored text

Gemini models landscape

Access to Gemini

Mini-Gemini Chatbot Demo