SANS

GERARD

Google Developer Expert

Developer Evangelist

International Speaker

Spoken 201 times in 43 countries

Google AI Learning Path

Vertex AI

Complexity

Features

Opening the world

Gemini Ultra benchmarks

Gemini for Open Source: Gemma

Responsible AI

Reduce Biases

Safe

Accountable to people

Designed with Privacy

Scientific Excellence

Follow all Principles

Socially Beneficial

Not for Surveillance

Not Weaponised

Not Unlawful

Not Harmful

Global

First steps in Generative AI

Vertex AI

Complexity

Features

Imagine

Image Generation

Google Lens

Google Search

Learn

Listen Response

+40 Languages

Be Creative

C++, Go, Java, Javascript, Python and Typescript

Code

Generate

complex graphics

Plot

Gemini extensions!

Access your GMail.

Do More

Deep integration with YouTube.

Save Time

Text Generation

Training: guessing the next word

Adjust model predictions using output

Wikipedia

Christopher is

Christopher Columbus was

Input

Output

Christopher Columbus discovered America

Christopher Columbus discovered America in

Christopher Columbus discovered America in 1492

Christopher Columbus discovered America in 1492 .

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492.

Christopher Columbus discovered America in 1492 .

Hyper-dimensional Graph

who

America

Columbus

discover

d_4

d_{2}

d_5

d_{1408}

d_{1}

d_{3}

Latent Space

Word Embedding

d_{1}

d_{1408}

Columbus

0.3

1.2

-1

0.9

Word Embedding

d_{1}

d_{1408}

America

1.5

0.3

0.7

0.1

Latent Space Exploration

AI generated text is...

Biased

Non-factual

Inaccurate

1+1= 3

Grounding: reducing hallucinations

VectorDB Embeddings

Google Knowledge Graph

Google Search

Fact Checking

Stochastic parrot or AGI?

Common Myths for LLMs today

Antropomorphic

A program not a human.

Brain analogy is a myth. Very limited reasoning and planning.

Deterministic

Uncertain by design. Highly sensitive to inputs and learned patterns. Answers may change using an extra punctuation.

Prompt-driven

Output is dependent on training data. Prompts can't fix data issues or uneven distributions.

Neutral

Biased by training data. This is a blind spot. Difficult to find small mistakes.

Fact-retrieval

Not a database. Can't store or retrieve facts. Data cut-off. Not a search engine.

Responsible AI: confirmation bias

Perceived accuracy: 100%

Perceived accuracy: 85%

Real accuracy: 20%

Can be dangerous in high-stakes contexts. Eg: health or finance.

Can be used when errors have little to no consequences.
Eg: summarising or rephrasing.

Safe to use in general. Perfect accuracy.

Computer Vision

Computer Vision tasks

Source: V7 Labs

Visual Training Datasets

A dog running

Digits 0-9

Ant

French cat

MNIST

60K

10 classes

COCO

330K

80 classes

ImageNet

14M

Image + caption

LAION (Web)

Image + text

Embedding high-dimensionality

A prompt will put you in a certain area of the latent space

Note the density of data points and noise ratios

Granular Visual Patches (ViT)

Query Patch

Detail

Image

Visual Attention Mechanism

"ginger fur"

"standing on a stone in the garden"

"well-groomed"

Image

Features

Attention

A new generalist Computer Vision

Visual Chat

VQA

Multi-turn

Reasoning

Extract Data

Handwriting

Data entry

OCR

Metadata

Identify

Recognition

Captioning

Categorising

Structure

Elements

Relationships

Hierarchies

Time/Space

Tracking

Activity

Causality

3D/4D

Multimodal: better understanding

Breaking language barriers

Advanced OCR: complex layouts

Emerging features: mirrored text

Computer Vision use-cases

Monday to Friday from 6:30 to 13:00 from 16:30 to 20:00

Extract the text for the opening hours and consolidate them in a single paragraph in English

Prompt

Output

Multimodal example

HORARI
DILLUNS A DIVENDRES
DE 6'30H. A

Extract Text

OCR

Image Input

Task

Raw Data

13H
16'30H. A 20'00H

Extract Text

Hand-written OCR

Reason

Mash-up fragments

HORARI
DILLUNS A DIVENDRES
DE 6'30H. A 13H 16'30H. A 20'00H

Translate

Catalan to English

Monday to Friday
from 6:30 to 13:00 from 16:30 to 20:00

Image Generation

Imagen 2: unlocking image creativity

Generative AI for creatives

magazine style 4k photorealistic,

modern red armchair

natural lighting

Portrait of a french bulldog

at the beach,

85mm f/2.8

Assortment

of delicious,

freshly-baked donuts

Prompt

Imagen 2

Image inpainting and upscaling

Original + mask

Imagen 2

Automatic image captioning

Input

Caption

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

GoogleAI

VertexAI

Pro 1.0

Pro 1.5

Ultra 1.0

Nano 1.0

New multi-modal architecture

1 Million tokens

Gemini 1.0 Pro 128K tokens

(100 pages)

Gemini 1.5 Pro 1M

(800 pages)

video

11h

audio

30K

LOC

800

pages

1 Million tokens

Text: Apollo 11 script

402-pages

Video: Buster Keaton movie

44-minutes

Code: three.js demos

100K-LOC

A new age of AI-driven OSS

RAG vs Massive Context

The Power of Context

Self-contained

Only what you need. Avoids fragmentation and relevance drift

Cost-effective

Lower investment over time, fixed query latency. No fixed costs

User-guided

Unified data for focus, easier to control and improve quality

Interpretable

Improved transparency, easier to debug & iterate

Extreme In-Context-Learning

English to Kalamang

Limited

Generative AI for Developers

Vertex AI

Complexity

Features

Your sandbox for prompts

Try it Today!

Gemini models landscape

Create an API Key in Google AI Studio

Gemini API Quickstart

Introducing Google Gemini

By Gerard Sans | Axiom 🇬🇧

Introducing Google Gemini

In this session, you will learn about Google's Generative AI tools and ecosystem including: Google AI Studio and Gemini chatbot. Google AI Studio is a tool for Developers to build the new wave of Generative AI applications using Gemini foundational models. We will cover a general overview, run some demos using the just released Gemini 1.5 Pro model and answer your questions around the future of AI!

4,671

Gerard Sans | Axiom 🇬🇧 PRO

Founder of Axiom Masterclass, professional trainings // Forging skills for the new era of AI. GDE in AI, Cloud & Angular. Building London's tech & art nexus @nextai_london. Speaker | MC | Trainer.

Google Developer Expert

Developer Evangelist

International Speaker

Google AI Learning Path

Opening the world

Gemini Ultra benchmarks

Gemini for Open Source: Gemma

Responsible AI

Reduce Biases

Safe

Accountable to people

Designed with Privacy

Scientific Excellence

Follow all Principles

Socially Beneficial

Not for Surveillance

Not Weaponised

Not Unlawful

Not Harmful

First steps in Generative AI

Imagine

Learn

Be Creative

Code

Plot

Gemini extensions!

Do More

Save Time

Text Generation

Training: guessing the next word

Hyper-dimensional Graph

Latent Space Exploration

AI generated text is...

Grounding: reducing hallucinations

Stochastic parrot or AGI?

Common Myths for LLMs today

Responsible AI: confirmation bias

Computer Vision

Computer Vision tasks

Visual Training Datasets

Embedding high-dimensionality

Granular Visual Patches (ViT)

Visual Attention Mechanism

A new generalist Computer Vision

Multimodal: better understanding

Breaking language barriers

Advanced OCR: complex layouts

Emerging features: mirrored text

Computer Vision use-cases

Multimodal example

Image Generation

Imagen 2: unlocking image creativity

Generative AI for creatives

Image inpainting and upscaling

Automatic image captioning

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

AI-driven interior design

New multi-modal architecture

1 Million tokens

A new age of AI-driven OSS

RAG vs Massive Context

The Power of Context

Extreme In-Context-Learning

Generative AI for Developers

Your sandbox for prompts

Try it Today!

Gemini models landscape

Create an API Key in Google AI Studio

Gemini API Quickstart

Introducing Google Gemini

More from Gerard Sans | Axiom 🇬🇧