Big Data friends
Scala and FP
a.k.a. Noootsab
Proud husband and father
Lidjeu po l'bon
Have to wear glasses since Maths graduation in '03
Learn to dress well since CS graduation in '05
Lost myself since expertize in Geomatic and GIS
Risking myself in NextLab (GIS, Big Data and Scala)
Public interest work: co-founded Wajug
Helper and organizer of Devoxx4Kids
Scala trainer
WHY I mean it
and others do...
Scala has a reputation to be accessible
It eases the maths (mostly [matrix]
The CS world is changing (fast)
It shifts from the cloud to analysis
That is, from IT needs to Market opportunities
Reused knowledge
Fact: syntax close to Java, C#, Ruby, ...
Cause: Object Oriented
case class Person( name:String,
) {
def incAge(n:Int):Person = copy(age = age+n)
def newSon(child:Person):(Person, Person) = {
val newChild = this.gender match {
case Male => child.copy(father = Some(this))
case Female => child.copy(mother = Some(this))
(newChild, this.copy(children = newChild :: children)
val _Noah = Person("Petrella", "Noah", age=4, Male, mother=Some(Sandrine)
father=None) val boringNoootsab = Person("Petrella", "Andy", 32, Male, father=Some(Arcangelo), mother=Some(Nadine)) val (Noah, happyNoootsab) = boringNoootsab.newSon(_Noah)
Following the wave
Fact: Functional Programming ftw
Cause: Scalable Language
Please, bear with me...
mainly data fans
10⁶ online students
PHP → Scala
Concurrency primitives
Type safety
case classes
productivity gains
concise code
Scala school
Tens of open source libs
Billion devices
Historical events
Real-time analytics
Proper API (Option)
Async (Try)
Scalatra + ScalaTest
And more
Snips (smart cities, ...)
Tuplejump (analytic platform)
eBay (analytics)
BBC (Future Media project)
Virdata (IoT analytic platform)
Ooyala (video analytic platform)
Functional Programming
in a nutshell
source wikipedia:
Input x
can be a function...
Defines a general process
that could behave differently
listOfNames map { name => DB.getByName(name) }
listOfPersons flatMap { person => person.friends }
listOfFriends filter { (f:Friend) => f.met moreThan (10 years) }
listOfOldFriends.count(_.person.gender != me.gender)
Output x
bah... can be a function as well...
Prepares a process
that will be available for later usage
def authentication(manager:SecurityManager): User=>Authentication
def source(url:String): Authentication=>DataRepo=>Data
val authenticate = authentication(FakeSecurityManager)
val settings = source("/settings")
def request = {
val user = //...
val auth = authenticate(user)
val settingsFetcher = settings(auth)
// and so on
Show me
def lm(x:List[Double], y:List[Double]):((Double, Double), Double=>Double) = {
val n = x.size
val ẍ = x.sum.toDouble / n
val ÿ = y.sum.toDouble / n
val Sp = ((x ∙- ẍ) ∙* (y ∙- ÿ) sum) / (n-1)
val Sx2 = ((x ∙- ẍ) ∙^ 2 sum) / (n-1)
val ß1 = Sp / Sx2
val ß0 = ÿ - ß1 * ẍ
val coefs = (ß0, ß1)
val predict = (d:Double) => ß0 + ß1 * d
(coefs, predict)
def test(ß0:Double = 18.1d, ß1:Double = 6d, error:Int=>List[Double]) = {
val n = 10000
val x:List[Double] = -n.toDouble to n by 1 toList
val e = error(2*n+1)
val y:List[Double] = ß0 ∙+: (ß1 ∙*: x) ∙+: e
lm(x, y)
val error = rnorm(mean=0, sigma=5) // gen gaussian nbs
val model = test(103, 7, error)
on github
yeah yeah... I'll do it
lazy val app:App = initializeApp()
def logDebug(m: => String)= if (LOG.debugEnabled) LOG.error(m) else ()
Avoid computations
Delayed initialization
Sooo laaazy
Come back... in a potential future
val app:Future[App] = initializeApp()
val http:Future[HttpClient] = _.http.client )
def isOk(url:String):Future[Boolean] =
http.flatMap(client => client.get(url) )
.map( _.code )
.filter( _ == 200 )
.recoverWith {
case x:CommunicationException => isOk(url)
}.recover {
case e: Throwable => false
Code... now
(I promised)
class LazyCons[+A](a:A, t: => Lazy[A]) extends Lazy[A] {
val head = Some(a)
lazy val tail = t
def fetch(file:String):Lazy[Future[String]] = {
val texts = io.Source.fromFile(new
def readLine(texts:Iterator[String]):Lazy[Future[String]] = //...
for the fun → val fibs:Stream[Int] = 0 #:: 1 #:: ((fibs zip fibs.drop(1)) map ((_:Int) + (_:Int)).tupled)
on github
A function could either
→ be called on data (method, sync)
→ be sent to the data (message, async)
A function composes
A function is a delayed computation
What if I compose all the computations
Then I send the whole shebang to where the data are?
Map/Reduce : degenerated case
Spark : generalized case (Back to Gerard's talk)
Funky code
trait Data {
def dependent:List[Double]
def observed:Matrix
def bootstrap(proportion:Double):Future[Data]
trait Model {
type Coefs
def apply(data:Data):Future[(Coefs, List[Double]=>Future[Double])]
def bagging(model:Model)(agg:Aggregation[model.Coefs], n:Int)(data:Data):Future[model.Coefs] = {
def exec:Future[model.Coefs] = for {
sample <- data.bootstrap(0.6)
(coefs, _) <- model(sample)
} yield coefs
val execs:List[Future[model.Coefs]] = List.fill(n)(exec)
val coefsList:Future[List[model.Coefs]] = Future.sequence(execs)
val result:Future[model.Coefs] = coefsList map agg
on github
Thanks ^_^
Poke me:
→ for Scala training
→ for fun with Data
→ with Books ideas
Scala and FP in Big Data
By andy petrella
Scala and FP in Big Data
Talk given for the meetup on July, 2014. Scala and FP introduced for the following talks about Spark.
- 3,853