Loading
Ryan B. Harvey
This is a live streamed presentation. You will automatically follow the presenter and see the slide they're currently on.
Ryan B. Harvey
∏ME • January 26, 2017
Coder, Datahead, PM, Wonk, Dad.
@CodeAndData • rharvey[at]loyno.edu
Data Engineer
Data Scientist & Software Architect
Research Associate Pro Bono
Co-Organizer & Data Wrangler
Important: Set members are unique.
"The universe"
Set operations
Special sets
Elements of a set come from a (possibly infinite) universe, denoted .
Boolean logic operations
Boolean variables and expressions can take two values: True, False
We've been using them implicitly in our definitions of sets, but let's look a bit closer at operations involving Booleans.
Result | |
---|---|
True
False
True
False
True
False
True
False
True
False
False
False
True
False
True
False
True
False
True
False
True
False
False
True
True
True
True
False
Important: Bag members are not necessarily unique.
Bag operations
There is no complement of a bag. (What multiplicity do you assign?)
No longer "membership" -- now we need "multiplicity" to describe.
Bag containment is defined in terms of multiplicity.
Bag containment is defined in terms of multiplicity.
"smallest" and "largest" are measured relative to containment ordering
Define the "duplicate elimination function"
Define the "bag concatenation function"
Note that this function takes a bag and returns a set of the same elements, with each element having multiplicity one.
Whereas bag union results in a bag with max multiplicity for each element among the two operand bags, bag concatenation results in each element having sum multiplicity
A relation scheme is a finite set of attribute names
.
Corresponding to each attribute name is a set called the
domain of , which we can denote .
Domains are arbitrary, non-empty sets, which may be finite or
countably infinite.
Define , the superset of all domains of attribute names in .
A relation on relation scheme is a finite set of mappings
such that for each
. The mappings are called tuples.
Definitions
ADDRESS | STATUS | DISTRICT | CATEGORY | DESCRIPTION |
---|---|---|---|---|
13634 Dwyer | Closed | E | Street Light | Pole #IBI35 |
Glouster & Norgate | Open | E | Sidewalk Repair | Accessibility ramp cracked |
7203 Benjamin | Closed | A | Street Light | Pole #BP45 |
2900 Dublin | Closed | A | Pothole | Travel lane |
2521 Marais | Closed | C | Sidewalk Repair | Cracked |
1113 Mandeville | Closed | C | Abandoned Vehicle | White Cadillac |
1914 Foucher | Open | B | Garbage Pickup | Start trash service |
Excerpts from the City of New Orleans' 311 service calls data set, available at data.nola.gov
ADDRESS | STATUS | DISTRICT | CATEGORY | DESCRIPTION |
---|---|---|---|---|
13634 Dwyer | Closed | E | Street Light | Pole #IBI35 |
Glouster & Norgate | Open | E | Sidewalk Repair | Accessibility ramp cracked |
7203 Benjamin | Closed | A | Street Light | Pole #BP45 |
2900 Dublin | Closed | A | Pothole | Travel lane |
2521 Marais | Closed | C | Sidewalk Repair | Cracked |
1113 Mandeville | Closed | C | Abandoned Vehicle | White Cadillac |
1914 Foucher | Open | B | Garbage Pickup | Start trash service |
ADDRESS | STATUS | DISTRICT | CATEGORY | DESCRIPTION |
---|---|---|---|---|
13634 Dwyer | Closed | E | Street Light | Pole #IBI35 |
Glouster & Norgate | Open | E | Sidewalk Repair | Accessibility ramp cracked |
7203 Benjamin | Closed | A | Street Light | Pole #BP45 |
2900 Dublin | Closed | A | Pothole | Travel lane |
2521 Marais | Closed | C | Sidewalk Repair | Cracked |
1113 Mandeville | Closed | C | Abandoned Vehicle | White Cadillac |
1914 Foucher | Open | B | Garbage Pickup | Start trash service |
ADDRESS | STATUS | DISTRICT | CATEGORY | DESCRIPTION |
---|---|---|---|---|
13634 Dwyer | Closed | E | Street Light | Pole #IBI35 |
Glouster & Norgate | Open | E | Sidewalk Repair | Accessibility ramp cracked |
7203 Benjamin | Closed | A | Street Light | Pole #BP45 |
2900 Dublin | Closed | A | Pothole | Travel lane |
2521 Marais | Closed | C | Sidewalk Repair | Cracked |
1113 Mandeville | Closed | C | Abandoned Vehicle | White Cadillac |
1914 Foucher | Open | B | Garbage Pickup | Start trash service |
ADDRESS | STATUS | DISTRICT | CATEGORY | DESCRIPTION |
---|---|---|---|---|
13634 Dwyer | Closed | E | Street Light | Pole #IBI35 |
Glouster & Norgate | Open | E | Sidewalk Repair | Accessibility ramp cracked |
7203 Benjamin | Closed | A | Street Light | Pole #BP45 |
2900 Dublin | Closed | A | Pothole | Travel lane |
2521 Marais | Closed | C | Sidewalk Repair | Cracked |
1113 Mandeville | Closed | C | Abandoned Vehicle | White Cadillac |
1914 Foucher | Open | B | Garbage Pickup | Start trash service |
ADDRESS | STATUS | DISTRICT | CATEGORY | DESCRIPTION |
---|---|---|---|---|
13634 Dwyer | Closed | E | Street Light | Pole #IBI35 |
Glouster & Norgate | Open | E | Sidewalk Repair | Accessibility ramp cracked |
7203 Benjamin | Closed | A | Street Light | Pole #BP45 |
2900 Dublin | Closed | A | Pothole | Travel lane |
2521 Marais | Closed | C | Sidewalk Repair | Cracked |
1113 Mandeville | Closed | C | Abandoned Vehicle | White Cadillac |
1914 Foucher | Open | B | Garbage Pickup | Start trash service |
For a relation scheme , if and are both relations on ,
consider and as sets over the same universe . Then we can
apply the boolean operations as follows.
Boolean Operators
Active Domain
Note that:
Define active domain of relative to relation as
Using the domain of scheme to compute the complement
of is not closed on relations. If any attribute
has such that then , and is not a relation.
Complement of a Relation
Thus, using the active domain, we define the active complement
which is finite, and always results in a relation on .
The set of all relations on a given scheme is closed under
union, intersection, set difference, and active complement.
Select is a unary operator on relations yeilding another relation that
is the subset of tuples with a given value on a specified attribute.
The "select" operator
With a relation on scheme , an attribute name in , and
,
is read "select equal to on ."
Select filters a relation's tuples leaving only those with the given
attribute value. Recalling the data table example, this is filtering
rows of the table by the value in a particular column.
The "select" operator (properties)
is also distributive over Boolean operators . For example,
for relations and , and assuming uniqueness of tuples,
is commutative under composition, so that
and we can instead write , with order of the attribute
equivalences unimportant.
Project is a unary operator on relations yeilding another relation
having a relational scheme with the specified attributes, and
containing tuples corresponding to each tuple in the operand.
The "project" operator
With a relation on scheme , and ,
is read "the projection of onto ."
Project filter's a relational scheme's attribute list. Recalling the data table example, this is taking specified columns of the table only. If there is an assumption of uniqueness of tuples, we would then apply the duplicate removal function.
The "project" operator (properties)
Under composition, order does matter -- latter applications of
subsume former, if the latter application is proper.
If , then .
Similarly, if , then
,
so that only outermost applications of the project operator need
be considered for evaluation.
Project commutes with select when the attribute(s) for selection
are among those in the set onto which the projection is taking place.
Join is a binary operator on relations yielding another relation
with a schema having a combination of attribute names, and
containing tuples corresponding to the tuples in each operand.
The "join" operator
Given relations and , ("the join of and ") is the
relation of all tuples over such that there are tuples
with .
If , then is the Cartesian product of and .
Since is a subset of both and , then
Let's review...
Operation | Operator | Function |
---|---|---|
Select | filter rows/tuples | |
Project | filter columns/attributes | |
Join | combine relations | |
Booleans | create filter conditions | |
Rename (new!) | re-label an attribute |
There's much more, and there's relational calculus too.
Maier, David. The Theory of Relational Databases. Computer Science Press, 1983.
Elmari, Ramez and Shamkant B. Navathe. Fundamentals of Database Systems. The Benjamin/Cummings Publishing Company, 1989.
Albert, Joseph. Algebraic Properties of Bag Data Types. Proceedings of the 17th International Conference on Very Large Data Bases. Barcelona, 1991.
Pagh, Ramus. Lecture notes on Relational algebra and SQL. Introduction to Databases, IT University of Copenhagen, 2005. http://www.itu.dk/people/pagh/IDB05/Algebra.pdf
Let's augment our relational operations table...
Operation | Operator | Function | SQL clause |
---|---|---|---|
Select | filter rows | WHERE, HAVING | |
Project | filter columns | SELECT col list | |
Join | combine relations | JOIN | |
Booleans | filter conditions | AND, OR, NOT IN | |
Rename (new!) | re-label attributes | as | |
Dup removal | remove duplicates | DISTINCT |
Some practical things to help us out...
We can add constraints to columns in tables to specify the domain of those attributes.
If we don't want to deal with the troubles duplicates cause, we can ensure uniqueness by adding a "key" constraint to one or more attributes/columns.
We can also add constraints to specify that each tuple in the relation must have a non-empty value for a particular attribute or set of attributes.
Let's try writing an expression
Say we want to find all the tuples in the status(CALLS) relation with STATUS "Open" and DISTRICT "A", and find only the CATEGORY.
The relational algebra expression for this looks like
If we were to use SQL to rewrite this operation on the database table, it looks like
SELECT CATEGORY
FROM CALLS
WHERE STATUS='Open' AND DISTRICT='A'