# Sets, Bags, and Relational Theory

## with Data

Ryan B. Harvey

∏ME • January 26, 2017

## Ryan B. Harvey

@CodeAndData • rharvey[at]loyno.edu

http://datascientist.guru

Data Engineer

TED Conferences

Data Scientist & Software Architect

Kitchology, Inc.

Research Associate Pro Bono

Loyola University New Orleans

Co-Organizer & Data Wrangler

Code for New Orleans

## A quick review of set theory...

Important: Set members are unique.

"The universe"

X=\lbrace0,1,2,3,4,5...\rbrace=\lbrace x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrace=\lbrace x|x\in\mathbb{N}\vee x\in\lbrace0\rbrace\rbrace
$X=\lbrace0,1,2,3,4,5...\rbrace=\lbrace x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrace=\lbrace x|x\in\mathbb{N}\vee x\in\lbrace0\rbrace\rbrace$
Y=\lbrace...-5,-4,-3,-2,-1\rbrace=\lbrace -x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrace=\lbrace -x|x\in\mathbb{N}\vee x\in\lbrace0\rbrace\rbrace
$Y=\lbrace...-5,-4,-3,-2,-1\rbrace=\lbrace -x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrace=\lbrace -x|x\in\mathbb{N}\vee x\in\lbrace0\rbrace\rbrace$
X
$X$
Y
$Y$
X-Y
$X-Y$
Y-X
$Y-X$
\lbrace 0\rbrace
$\lbrace 0\rbrace$
\mathbb{N}
$\mathbb{N}$
X\cap Y
$X\cap Y$

Set operations

Special sets

X\cup Y=\mathbb{Z}=\lbrace...-3,-2,-1,0,1,2,3...\rbrace
$X\cup Y=\mathbb{Z}=\lbrace...-3,-2,-1,0,1,2,3...\rbrace$
X\cap Y=\lbrace0\rbrace
$X\cap Y=\lbrace0\rbrace$
X\setminus Y=\mathbb{N}
$X\setminus Y=\mathbb{N}$
Y\setminus X=\lbrace-y|y\in\mathbb{N}\rbrace
$Y\setminus X=\lbrace-y|y\in\mathbb{N}\rbrace$
\text{The null set: }\emptyset=\lbrace\rbrace,\text{ }\#\emptyset=0
$\text{The null set: }\emptyset=\lbrace\rbrace,\text{ }\#\emptyset=0$
\text{Complement of }X\text{: }\overline X=\lbrace x|x\in\mathcal{U}\wedge x\notin X\rbrace=\lbrace x|x\in(\mathcal{U}\setminus X)\rbrace
$\text{Complement of }X\text{: }\overline X=\lbrace x|x\in\mathcal{U}\wedge x\notin X\rbrace=\lbrace x|x\in(\mathcal{U}\setminus X)\rbrace$

Elements of a set come from a (possibly infinite) universe, denoted     .

\lbrace1,2,3,4...\rbrace\subset X
$\lbrace1,2,3,4...\rbrace\subset X$
\lbrace0,1,2,3,4\rbrace\subseteq X
$\lbrace0,1,2,3,4\rbrace\subseteq X$
\mathcal{U}
$\mathcal{U}$
\mathcal{U}
$\mathcal{U}$

## And a quick review of Booleans...

Boolean logic operations

\neg x\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;\text{not }x\text{''}
$\neg x\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;\text{not }x\text{''}$

Boolean variables and expressions can take two values: True, False

We've been using them implicitly in our definitions of sets, but let's look a bit closer at operations involving Booleans.

x\wedge y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;x\text{ and }y\text{''}
$x\wedge y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;x\text{ and }y\text{''}$
x\vee y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;x\text{ or }y\text{''}
$x\vee y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;x\text{ or }y\text{''}$
x\oplus y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;x\text{ exclusive or }y\text{''}
$x\oplus y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;x\text{ exclusive or }y\text{''}$
Result
x\oplus y
$x\oplus y$
x\vee y
$x\vee y$
x\wedge y
$x\wedge y$
\neg x
$\neg x$

True

False

True

False

True

False

True

False

True

False

False

False

True

False

True

False

True

False

True

False

True

False

False

True

True

True

True

False

# From sets to bags

## From sets to bags (multisets)...

Important: Bag members are not necessarily unique.

X=\lbrack0,1,2,3,4,5...\rbrack=\lbrack x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrack
$X=\lbrack0,1,2,3,4,5...\rbrack=\lbrack x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrack$
Y=\lbrack...-5,-4,-3,-2,-1\rbrack=\lbrack -x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrack
$Y=\lbrack...-5,-4,-3,-2,-1\rbrack=\lbrack -x |x\in \mathbb{N}\cup\lbrace0\rbrace\rbrack$

Bag operations

There is no complement of a bag. (What multiplicity do you assign?)

\text{Membership: }x\text{ is in }X\text{ }\Leftrightarrow x\in X
$\text{Membership: }x\text{ is in }X\text{ }\Leftrightarrow x\in X$

No longer "membership" -- now we need "multiplicity" to describe.

\text{Multiplicity: }X\text{ has }5\text{ copies of }x\Leftrightarrow\#_{\in X}(x)=5
$\text{Multiplicity: }X\text{ has }5\text{ copies of }x\Leftrightarrow\#_{\in X}(x)=5$

Bag containment is defined in terms of multiplicity.

X\subseteq W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;\forall x\in X, \#_{\in X}(x)\leq\#_{\in W}(x)
$X\subseteq W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;\forall x\in X, \#_{\in X}(x)\leq\#_{\in W}(x)$
X\subset W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;X\subseteq W\wedge X\neq W
$X\subset W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;X\subseteq W\wedge X\neq W$

## Bag operations (continued)...

X\cup Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;\text{ the smallest bag }W\text{ such that }X\subseteq W\wedge Y\subseteq W
$X\cup Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;\text{ the smallest bag }W\text{ such that }X\subseteq W\wedge Y\subseteq W$
\text{Multiplicity: }X\text{ has }5\text{ copies of }x\Leftrightarrow\#_{\in X}(x)=5
$\text{Multiplicity: }X\text{ has }5\text{ copies of }x\Leftrightarrow\#_{\in X}(x)=5$

Bag containment is defined in terms of multiplicity.

X\subseteq W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;\forall x\in X, \#_{\in X}(x)\leq\#_{\in W}(x)
$X\subseteq W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;\forall x\in X, \#_{\in X}(x)\leq\#_{\in W}(x)$
X\subset W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;X\subseteq W\wedge X\neq W
$X\subset W\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize \Longleftrightarrow \\ \end{matrix}}\;X\subseteq W\wedge X\neq W$
X\cap Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;\text{ the largest bag }W\text{ such that }W\subseteq X\wedge W\subseteq Y
$X\cap Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\;\text{ the largest bag }W\text{ such that }W\subseteq X\wedge W\subseteq Y$

"smallest" and "largest" are measured relative to containment ordering

X\setminus Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\; \text{ the bag }W\subseteq X\text{ such that } \forall x\in\mathcal{U},
$X\setminus Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\; \text{ the bag }W\subseteq X\text{ such that } \forall x\in\mathcal{U},$
\#_{\in W}(x) = \max((\#_{\in X}(x)-\#_{\in Y}(x)), 0)
$\#_{\in W}(x) = \max((\#_{\in X}(x)-\#_{\in Y}(x)), 0)$

## Bag operations (new ones)...

\text{Multiplicity: }X\text{ has }5\text{ copies of }x\Leftrightarrow\#_{\in X}(x)=5
$\text{Multiplicity: }X\text{ has }5\text{ copies of }x\Leftrightarrow\#_{\in X}(x)=5$
\delta(X)\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\; \text{ the set }\lbrace x|x\in X\rbrace
$\delta(X)\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\; \text{ the set }\lbrace x|x\in X\rbrace$

Define the "duplicate elimination function"

X\sqcup Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\; \text{ the bag }W\text{ such that }\forall x\in\mathcal{U},
$X\sqcup Y\;{\tiny\begin{matrix} _{\scriptsize \text{def}} \\ \normalsize = \\ \end{matrix}}\; \text{ the bag }W\text{ such that }\forall x\in\mathcal{U},$

Define the "bag concatenation function"

Note that this function takes a bag and returns a set of the same elements, with each element having multiplicity one.

\#_{\in W}(x)=\#_{\in X}(x)+\#_{\in Y}(x)
$\#_{\in W}(x)=\#_{\in X}(x)+\#_{\in Y}(x)$

Whereas bag union results in a bag with max multiplicity for each element among the two operand bags, bag concatenation results in each element having sum multiplicity

# From bags to relations

## If that wasn't enough... relations!

\lbrace A_1,A_2,...,A_n\rbrace
$\lbrace A_1,A_2,...,A_n\rbrace$

A relation scheme       is a finite set of attribute names

.

Corresponding to each attribute name        is a set        called the

domain of       , which we can denote                               .

Domains are arbitrary, non-empty sets, which may be finite or

countably infinite.

Define                                                    , the superset of all domains of attribute names in     .

A relation      on relation scheme      is a finite set of mappings

such that for each

.  The mappings       are called tuples.

Definitions

R
$R$
A_i
$A_i$
D_i
$D_i$
A_i
$A_i$
D_i=\text{dom}(A_i)
$D_i=\text{dom}(A_i)$
\mathbb{D}=D_1\cup D_2\cup\cdots\cup D_n
$\mathbb{D}=D_1\cup D_2\cup\cdots\cup D_n$
R
$R$
R
$R$
r
$r$
\lbrace t_1, t_2, \cdots, t_p\rbrace, t_k:R\rightarrow \mathbb{D}
$\lbrace t_1, t_2, \cdots, t_p\rbrace, t_k:R\rightarrow \mathbb{D}$
t_k\in r, t_k(A_i)\in D_i,
$t_k\in r, t_k(A_i)\in D_i,$
1\leq i\leq n
$1\leq i\leq n$
t_k
$t_k$

# Um, whut?

## Whut?

13634 Dwyer Closed E Street Light Pole #IBI35
Glouster & Norgate Open E Sidewalk Repair Accessibility ramp cracked
7203 Benjamin Closed A Street Light Pole #BP45
2900 Dublin Closed A Pothole Travel lane
2521 Marais Closed C Sidewalk Repair Cracked
1113 Mandeville Closed C Abandoned Vehicle White Cadillac
1914 Foucher Open B Garbage Pickup Start trash service

Excerpts from the City of New Orleans' 311 service calls data set, available at data.nola.gov

## Whut?

13634 Dwyer Closed E Street Light Pole #IBI35
Glouster & Norgate Open E Sidewalk Repair Accessibility ramp cracked
7203 Benjamin Closed A Street Light Pole #BP45
2900 Dublin Closed A Pothole Travel lane
2521 Marais Closed C Sidewalk Repair Cracked
1113 Mandeville Closed C Abandoned Vehicle White Cadillac
1914 Foucher Open B Garbage Pickup Start trash service
\text{relation scheme }R=\text{CALLS}=\{A_i\}_{i=1}^5
$\text{relation scheme }R=\text{CALLS}=\{A_i\}_{i=1}^5$
(n=5)
$(n=5)$

## Whut?

13634 Dwyer Closed E Street Light Pole #IBI35
Glouster & Norgate Open E Sidewalk Repair Accessibility ramp cracked
7203 Benjamin Closed A Street Light Pole #BP45
2900 Dublin Closed A Pothole Travel lane
2521 Marais Closed C Sidewalk Repair Cracked
1113 Mandeville Closed C Abandoned Vehicle White Cadillac
1914 Foucher Open B Garbage Pickup Start trash service
\text{relation }r=\text{status(CALLS)}=\{t_k\}_{k=1}^7
$\text{relation }r=\text{status(CALLS)}=\{t_k\}_{k=1}^7$
(p=7)
$(p=7)$

## Whut?

13634 Dwyer Closed E Street Light Pole #IBI35
Glouster & Norgate Open E Sidewalk Repair Accessibility ramp cracked
7203 Benjamin Closed A Street Light Pole #BP45
2900 Dublin Closed A Pothole Travel lane
2521 Marais Closed C Sidewalk Repair Cracked
1113 Mandeville Closed C Abandoned Vehicle White Cadillac
1914 Foucher Open B Garbage Pickup Start trash service
\text{tuple }t_5=(\text{2521 Marais"}, \text{Closed"}, \text{C"}, \text{Sidewalk Repair"}, \text{Cracked"})
$\text{tuple }t_5=(\text{2521 Marais"}, \text{Closed"}, \text{C"}, \text{Sidewalk Repair"}, \text{Cracked"})$

## Whut?

13634 Dwyer Closed E Street Light Pole #IBI35
Glouster & Norgate Open E Sidewalk Repair Accessibility ramp cracked
7203 Benjamin Closed A Street Light Pole #BP45
2900 Dublin Closed A Pothole Travel lane
2521 Marais Closed C Sidewalk Repair Cracked
1113 Mandeville Closed C Abandoned Vehicle White Cadillac
1914 Foucher Open B Garbage Pickup Start trash service
A_3=\text{DISTRICT"}, \text{dom}(\text{DISTRICT"})=\{x|x\text{ is a one-letter city council district code}\}
$A_3=\text{DISTRICT"}, \text{dom}(\text{DISTRICT"})=\{x|x\text{ is a one-letter city council district code}\}$

## Whut?

13634 Dwyer Closed E Street Light Pole #IBI35
Glouster & Norgate Open E Sidewalk Repair Accessibility ramp cracked
7203 Benjamin Closed A Street Light Pole #BP45
2900 Dublin Closed A Pothole Travel lane
2521 Marais Closed C Sidewalk Repair Cracked
1113 Mandeville Closed C Abandoned Vehicle White Cadillac
1914 Foucher Open B Garbage Pickup Start trash service
A_2=\text{STATUS"}, \text{dom}(\text{STATUS"})=\{\text{Open"},\text{Closed"}\}
$A_2=\text{STATUS"}, \text{dom}(\text{STATUS"})=\{\text{Open"},\text{Closed"}\}$

# Relational algebra

## An algebra on relations

For a relation scheme    , if            and           are both relations on    ,

consider    and    as sets over the same universe    .  Then we can

apply the boolean operations as follows.

Boolean Operators

A_i
$A_i$
R
$R$
R
$R$
\text{Let dom}(R)=\{\text{all tuples }t\text{ over all }A_i\in R\text{ and their domains}\}
$\text{Let dom}(R)=\{\text{all tuples }t\text{ over all }A_i\in R\text{ and their domains}\}$
r\cap s = \text{relation on }R\text{ containing tuples }t\text{ such that }t\in r\text{ and }t\in s
$r\cap s = \text{relation on }R\text{ containing tuples }t\text{ such that }t\in r\text{ and }t\in s$
r(R)
$r(R)$
s(R)
$s(R)$
s
$s$
r
$r$
\mathcal{U}
$\mathcal{U}$
r\cup s = \text{relation on }R\text{ containing tuples }t\text{ such that }t\in r\text{ or }t\in s
$r\cup s = \text{relation on }R\text{ containing tuples }t\text{ such that }t\in r\text{ or }t\in s$
r\setminus s = \text{relation on }R\text{ containing tuples }t\text{ such that }t\in r\text{ and }t\notin s
$r\setminus s = \text{relation on }R\text{ containing tuples }t\text{ such that }t\in r\text{ and }t\notin s$
r\cap s=r\setminus(r\setminus s)
$r\cap s=r\setminus(r\setminus s)$

Active Domain

Note that:

Define active domain of       relative to relation    as

r
$r$
\text{adom}(A_i,r)=\{d\in D_i|\exists t\in r\text{ with }t(A_i)=d\}
$\text{adom}(A_i,r)=\{d\in D_i|\exists t\in r\text{ with }t(A_i)=d\}$

## An algebra on relations

Using the domain of scheme      to compute the complement

of            is not closed on relations.  If any attribute

has       such that                       then                   , and    is not a relation.

Complement of a Relation

R
$R$
\#\bar r=\infty
$\#\bar r=\infty$
r(R)
$r(R)$
A_i\in R
$A_i\in R$
\bar r=\text{dom}(R)\setminus r
$\bar r=\text{dom}(R)\setminus r$
\bar r
$\bar r$
\#D_i=\infty
$\#D_i=\infty$
D_i
$D_i$

Thus, using the active domain, we define the active complement

which is finite, and always results in a relation on    .

R
$R$
$\tilde r=\text{adom}(R,r)\setminus r$

The set of all relations on a given scheme     is closed under

union, intersection, set difference, and active complement.

R
$R$

## An algebra on relations

Select is a unary operator on relations yeilding another relation that

is the subset of tuples with a given value on a specified attribute.

The "select" operator

\sigma_{A=a}(r)=r'(R)=\{t\in r|t(A)=a\}
$\sigma_{A=a}(r)=r'(R)=\{t\in r|t(A)=a\}$
\sigma_{A=a}(r)
$\sigma_{A=a}(r)$
a
$a$

With           a relation on scheme    ,     an attribute name in    , and

,

is read "select     equal to     on   ."

a\in\text{dom}(A)
$a\in\text{dom}(A)$
\sigma
$\sigma$
A
$A$
r
$r$

Select filters a relation's tuples leaving only those with the given

attribute value.  Recalling the data table example, this is filtering

rows of the table by the value in a particular column.

A
$A$
r(R)
$r(R)$
R
$R$
R
$R$

## An algebra on relations

The "select" operator       (properties)

is also distributive over Boolean operators              .  For example,

for relations            and          , and assuming uniqueness of tuples,

\sigma
$\sigma$
\sigma
$\sigma$
\sigma_{A=a}(r\cap s)=\sigma_{A=a}(\{t|t\in r\wedge t\in s\})
$\sigma_{A=a}(r\cap s)=\sigma_{A=a}(\{t|t\in r\wedge t\in s\})$
\cup,\cap,\setminus
$\cup,\cap,\setminus$

is commutative under composition, so that

and we can instead write                         , with order of the attribute

equivalences unimportant.

\sigma
$\sigma$
\sigma_{A=a}(\sigma_{B=b}(r))=\sigma_{A=a}\circ\sigma_{B=b}=\sigma_{B=b}\circ\sigma_{A=a}=\sigma_{B=b}(\sigma_{A=a}(r))
$\sigma_{A=a}(\sigma_{B=b}(r))=\sigma_{A=a}\circ\sigma_{B=b}=\sigma_{B=b}\circ\sigma_{A=a}=\sigma_{B=b}(\sigma_{A=a}(r))$
\sigma_{A=a,B=b}(r)
$\sigma_{A=a,B=b}(r)$
r(R)
$r(R)$
s(R)
$s(R)$
=\{t'\in\{t|t\in r\wedge t\in s\}|t'(A)=a\}
$=\{t'\in\{t|t\in r\wedge t\in s\}|t'(A)=a\}$
=\{t|t\in r\wedge t(A)=a\}\cap\{t|t\in s\wedge t(A)=a\}
$=\{t|t\in r\wedge t(A)=a\}\cap\{t|t\in s\wedge t(A)=a\}$
=\sigma_{A=a}(\{t|t\in r\})\cap\sigma_{A=a}(\{t|t\in s\})
$=\sigma_{A=a}(\{t|t\in r\})\cap\sigma_{A=a}(\{t|t\in s\})$
=\sigma_{A=a}(r)\cap\sigma_{A=a}(s)
$=\sigma_{A=a}(r)\cap\sigma_{A=a}(s)$

## An algebra on relations

Project is a unary operator on relations yeilding another relation

having a relational scheme with the specified attributes, and

containing tuples corresponding to each tuple in the operand.

The "project" operator

\pi_X(r)=r'(X)=\{t(X)|t\in r(R)\}
$\pi_X(r)=r'(X)=\{t(X)|t\in r(R)\}$
\pi_X(r)
$\pi_X(r)$

With           a relation on scheme    , and               ,

is read "the projection of     onto     ."

X\in R
$X\in R$
\pi
$\pi$
X
$X$
r
$r$

Project filter's a relational scheme's attribute list.  Recalling the data table example, this is taking specified columns of the table only.  If there is an assumption of uniqueness of tuples, we would then apply the duplicate removal function.

r(R)
$r(R)$
R
$R$

## An algebra on relations

The "project" operator       (properties)

\pi
$\pi$

Under composition, order does matter -- latter applications of

subsume former, if the latter application is proper.

If                          , then                                                                     .

Similarly, if                                                                   , then

,

so that only outermost applications of the project operator need

be considered for evaluation.

Project commutes with select when the attribute(s) for selection

are among those in the set onto which the projection is taking place.

\pi
$\pi$
X\subseteq Y\subseteq R
$X\subseteq Y\subseteq R$
\pi_X\circ\pi_Y(r)=\pi_X(\pi_Y(r))=\pi_X(r)
$\pi_X\circ\pi_Y(r)=\pi_X(\pi_Y(r))=\pi_X(r)$
X_1\subseteq X_2\subseteq X_3\subseteq\cdots\subseteq X_n\subseteq R
$X_1\subseteq X_2\subseteq X_3\subseteq\cdots\subseteq X_n\subseteq R$
\pi_{X_1}\circ\pi_{X_2}(r)\circ\pi_{X_2}(r)\cdots\pi_{X_n}(r)=\pi_{X_1}(r)
$\pi_{X_1}\circ\pi_{X_2}(r)\circ\pi_{X_2}(r)\cdots\pi_{X_n}(r)=\pi_{X_1}(r)$
\text{For }A\in X\subseteq R\text{ and }r(R)\text{ a relation, }\pi_X(\sigma_{A=a}(r))=\sigma_{A=a}(\pi_X(r))
$\text{For }A\in X\subseteq R\text{ and }r(R)\text{ a relation, }\pi_X(\sigma_{A=a}(r))=\sigma_{A=a}(\pi_X(r))$

## An algebra on relations

Join is a binary operator on relations yielding another relation

with a schema having a combination of attribute names, and

containing tuples corresponding to the tuples in each operand.

The "join" operator

r\bowtie s
$r\bowtie s$
q(RS)
$q(RS)$

Given relations            and          ,              ("the join of    and    ") is the

relation               of all tuples    over        such that there are tuples

with                                         .

RS
$RS$
\bowtie
$\bowtie$
t
$t$
r
$r$

If                      , then              is the Cartesian product of    and   .

r(R)
$r(R)$
t_r=t(R), t_s=t(S)
$t_r=t(R), t_s=t(S)$
s(S)
$s(S)$
s
$s$
t_r\in r, t_s\in s
$t_r\in r, t_s\in s$
t_r(R\cap S)=t_s(R\cap S)
$t_r(R\cap S)=t_s(R\cap S)$

Since               is a  subset of both      and     , then

R
$R$
r
$r$
R\cap S
$R\cap S$
s
$s$
S
$S$
r\bowtie s
$r\bowtie s$
R\cap S=\emptyset
$R\cap S=\emptyset$

## An algebra on relations

Let's review...

Operation Operator Function
Select filter rows/tuples
Project filter columns/attributes
Join combine relations
Booleans create filter conditions
Rename (new!) re-label an attribute
\sigma
$\sigma$
\pi
$\pi$
\bowtie
$\bowtie$
\cap,\cup,\setminus
$\cap,\cup,\setminus$
\rho
$\rho$

There's much more, and there's relational calculus too.

Maier, David.  The Theory of Relational Databases.  Computer Science Press, 1983.

Elmari, Ramez and Shamkant B. Navathe. Fundamentals of Database Systems. The Benjamin/Cummings Publishing Company, 1989.

Albert, Joseph. Algebraic Properties of Bag Data Types. Proceedings of the 17th International Conference on Very Large Data Bases. Barcelona, 1991.

Pagh, Ramus.  Lecture notes on Relational algebra and SQL. Introduction to Databases, IT University of Copenhagen, 2005. http://www.itu.dk/people/pagh/IDB05/Algebra.pdf

# To the database!

## Now for some SQL

Let's augment our relational operations table...

Operation Operator Function SQL clause
Select filter rows WHERE, HAVING
Project filter columns SELECT col list
Join combine relations JOIN
Booleans filter conditions AND, OR, NOT IN
Rename (new!) re-label attributes as
Dup removal remove duplicates DISTINCT
\sigma
$\sigma$
\pi
$\pi$
\bowtie
$\bowtie$
\cap,\cup,\setminus
$\cap,\cup,\setminus$
\rho
$\rho$
\delta
$\delta$

## Now for some SQL

Some practical things to help us out...

We can add constraints to columns in tables to specify the domain of those attributes.

If we don't want to deal with the troubles duplicates cause, we can ensure uniqueness by adding a "key" constraint to one or more attributes/columns.

We can also add constraints to specify that each tuple in the relation must have a non-empty value for a particular attribute or set of attributes.

## Now for some SQL

Let's try writing an expression

Say we want to find all the tuples in the status(CALLS) relation with STATUS "Open" and DISTRICT "A", and find only the CATEGORY.

The relational algebra expression for this looks like

If we were to use SQL to rewrite this operation on the database table, it looks like

SELECT CATEGORY

FROM CALLS

WHERE STATUS='Open' AND DISTRICT='A'

\pi_{\text{CATEGORY}}\circ\sigma_{\text{STATUS}=\text{Open"},\text{DISTRICT}=\text{A"}}(\text{status}(\text{CALLS}))
$\pi_{\text{CATEGORY}}\circ\sigma_{\text{STATUS}=\text{Open"},\text{DISTRICT}=\text{A"}}(\text{status}(\text{CALLS}))$