Big Data friends
Scala and FP

a.k.a. Noootsab

Proud husband and father

Lidjeu po l'bon

Have to wear glasses since Maths graduation in '03
Learn to dress well since CS graduation in '05

Lost myself since expertize in Geomatic and GIS
Risking myself in NextLab (GIS, Big Data and Scala)
Public interest work: co-founded Wajug
Helper and organizer of Devoxx4Kids
Scala trainer

WHY I mean it

and others do...
Scala has a reputation to be accessible
It eases the maths (mostly [matrix]  algebra)
The CS world is changing (fast)
It shifts from the cloud to analysis
That is, from IT needs to Market opportunities

Reused knowledge

Fact: syntax close to Java, C#, Ruby, ...
Cause: Object Oriented
case class Person(  name:String, 
) {
  def incAge(n:Int):Person = copy(age = age+n)
  def newSon(child:Person):(Person, Person) = {
    val newChild = this.gender match {
        case Male => child.copy(father = Some(this))
        case Female => child.copy(mother = Some(this))
    (newChild, this.copy(children = newChild :: children)
    val _Noah = Person("Petrella", "Noah", 
                       age=4, Male,
father=None) val boringNoootsab = Person("Petrella", "Andy", 32, Male, father=Some(Arcangelo), mother=Some(Nadine)) val (Noah, happyNoootsab) = boringNoootsab.newSon(_Noah)

Following the wave

Fact: Functional Programming ftw
Cause: Scalable Language

Please, bear with me...



mainly data fans


10⁶ online students
PHP → Scala
Concurrency primitives
Type safety


case classes
productivity gains
concise code

Scala school
Tens of open source libs


Billion devices
Historical events
Real-time analytics
Proper API (Option)
Async (Try)
Scalatra + ScalaTest

And more

Snips (smart cities, ...)
Tuplejump (analytic platform)
eBay (analytics)
BBC (Future Media project)
Virdata (IoT analytic platform)
Ooyala (video analytic platform)

Functional Programming

in a nutshell

source wikipedia:

Input x

can be a function...

Defines a general process
that could behave differently
listOfNames map { name => DB.getByName(name) }

listOfPersons flatMap { person => person.friends }

listOfFriends filter { (f:Friend) => f.met moreThan (10 years) }

listOfOldFriends.count(_.person.gender != me.gender)

Output x

bah... can be a function as well...

Prepares a process
that will be available for later usage
def authentication(manager:SecurityManager): User=>Authentication

def source(url:String): Authentication=>DataRepo=>Data


val authenticate = authentication(FakeSecurityManager)
val settings = source("/settings")
def request = {
  val user = //...
  val auth = authenticate(user)
  val settingsFetcher = settings(auth)
  // and so on

Show me

 def lm(x:List[Double], y:List[Double]):((Double, Double), Double=>Double) = {
  val n = x.size
  val ẍ = x.sum.toDouble / n
  val ÿ = y.sum.toDouble / n
  val Sp = ((x ∙- ẍ) ∙* (y ∙- ÿ) sum) / (n-1)
  val Sx2 = ((x ∙- ẍ) ∙^ 2 sum) / (n-1)
  val ß1 = Sp / Sx2
  val ß0 = ÿ - ß1 * ẍ
  val coefs = (ß0, ß1)
  val predict = (d:Double) => ß0 + ß1 * d
  (coefs, predict)
def test(ß0:Double = 18.1d, ß1:Double = 6d, error:Int=>List[Double]) = {
  val n = 10000
  val x:List[Double] = -n.toDouble to n by 1 toList
  val e = error(2*n+1)
  val y:List[Double] = ß0 ∙+: (ß1 ∙*: x) ∙+: e
  lm(x, y)
val error = rnorm(mean=0, sigma=5) // gen gaussian nbs 
val model = test(103, 7, error)
on github


yeah yeah... I'll do it
lazy val app:App = initializeApp()

def logDebug(m: => String)= if (LOG.debugEnabled) LOG.error(m) else ()
Avoid computations
Delayed initialization

Sooo laaazy

Come back... in a potential future

val app:Future[App] = initializeApp()

val http:Future[HttpClient] = _.http.client )

def isOk(url:String):Future[Boolean] = 
    http.flatMap(client => client.get(url) )
        .map( _.code )
        .filter( _ == 200 )
        .recoverWith {
            case x:CommunicationException => isOk(url)
        }.recover {
            case e: Throwable => false

Code... now

(I promised)
class LazyCons[+A](a:A, t: => Lazy[A]) extends Lazy[A] {
  val head = Some(a)
  lazy val tail = t
def fetch(file:String):Lazy[Future[String]] = {
  val texts = io.Source.fromFile(new
  def readLine(texts:Iterator[String]):Lazy[Future[String]] = //...
for the funval fibs:Stream[Int] = 0 #:: 1 #:: ((fibs zip fibs.drop(1)) map  ((_:Int) + (_:Int)).tupled)
on github


A function could either 
→ be called on data (method, sync)
→ be sent to the data (message, async)

A function composes

A function is a delayed computation



What if I compose all the computations

Then I send the whole shebang to where the data are?

Map/Reduce : degenerated case
Spark : generalized case (Back to Gerard's talk)

Funky code

trait Data {
  def dependent:List[Double]
  def observed:Matrix
  def bootstrap(proportion:Double):Future[Data]
trait Model {
  type Coefs
  def apply(data:Data):Future[(Coefs, List[Double]=>Future[Double])]
def bagging(model:Model)(agg:Aggregation[model.Coefs], n:Int)(data:Data):Future[model.Coefs] = {
  def exec:Future[model.Coefs] =  for {
                                    sample     <- data.bootstrap(0.6)
                                    (coefs, _) <- model(sample)
                                  } yield coefs
  val execs:List[Future[model.Coefs]] = List.fill(n)(exec)
  val coefsList:Future[List[model.Coefs]] = Future.sequence(execs)

  val result:Future[model.Coefs] = coefsList map agg
on github


Thanks ^_^

Poke me:
→ for Scala training
→ for fun with Data
→ with Books ideas

Scala and FP in Big Data

By andy petrella

Scala and FP in Big Data

Talk given for the meetup on July, 2014. Scala and FP introduced for the following talks about Spark.

  • 2,271
Loading comments...

More from andy petrella