better-files

Towards simple, safe, sane I/O in Scala

Problems

  • Read a string from a file?
  • Append lines to a file?
  • Rename a file?
  • Move a directory?
  • Delete a directory?
  • Get the md5 of a file?
  • Glob a directory?
  • Check file permissions?
  • Follow symlinks?
  • chmod/chown?
  • zip/unzip?
  • Size of a directory?
  • Slurp/write bytes?

Fail

Appending to a file

import java.io.{File, BufferedWriter, FileOutputStream, FileWriter, OutputStreamWriter}
import scala.io.Codec

def append(file: File, text: String)(implicit codec: Codec): Unit = {
  val fout = new FileOutputStream(file, true)
  val writer = new BufferedWriter(new OutputStreamWriter(fout, codec.charSet))
  try {
    writer.append(text)
  } finally {
    fout.close()
    writer.close()
  }
}
  • What if file does not exist?
  • What if it is a directory?
  • What if I lack write permissions?
  • What if it has a write lock?
  • What if it is a symlink?

Copying Files

import java.nio.file.{Files, Path}

def copy(source: Path, destination: Path): Unit = {
  Files.copy(source, destination)
}

"Directories can be copied. However, files inside the directory are not copied, so the new directory is empty even when the original directory contains files."

https://docs.oracle.com/javase/tutorial/essential/io/copy.html

Copying Files Directories

import java.nio.file.attribute.BasicFileAttributes
import java.nio.file.{Files, Path, SimpleFileVisitor}

/**
 * Scala port of https://docs.oracle.com/javase/tutorial/essential/io/examples/Copy.java
 */
def copy(source: Path, destination: Path): Unit = {
  if(Files.isDirectory(source)) {
    Files.walkFileTree(source, new SimpleFileVisitor[Path] {
      def newPath(subPath: Path): Path = destination resolve (source relativize subPath)

      override def preVisitDirectory(dir: Path, attrs: BasicFileAttributes) = {
        Files.createDirectories(newPath(dir))
        super.preVisitDirectory(dir, attrs)
      }

      override def visitFile(file: Path, attrs: BasicFileAttributes) = {
        Files.copy(file, newPath(file))
        super.visitFile(file, attrs)
      }
    })
  } else {
    Files.copy(source, destination)
  }
}
  • What if destination exists?
  • Symlink handling? Follow vs no-follow?
  • What if source and destination on different filesystems?

Java Solutions

  • Apache Commons IO / Google Guava
    • Works! But, it's built for Java:
    • FileUtil.read(file) vs file.read()
    • scanner.nextInt() vs scanner.next[Int]
  • Plain Java NIO
    • java.nio.Files.lines(myFile).count
import java.nio.file.{Files, Path}
  
def chown(file: Path, owner: String): Unit = {  
  Files.setOwner(file, 
    file.getFileSystem
        .getUserPrincipalLookupService
        .lookupPrincipalByName(owner)
  )
}

Scala Solutions

Scala - A Melting Pot of Ideas

"A drunken Martin Odersky sees a Reese's Peanut Butter Cup ad featuring somebody's peanut butter getting on somebody else's chocolate and has an idea. He creates Scala, a language that unifies constructs from both object oriented and functional languages. This pisses off both groups and each promptly declares jihad."

  ~James Iry (2009)

(A Brief, Incomplete, and Mostly Wrong History of Programming Languages)

 

 

 

 Example: github.com/scala/slip/issues/19

Completely Opinionated Design Goals

  1. One library to rule them all - all utils you will ever need:
    • Google Guava + Apache Commons IO + Jodd + Java NIO utils
  2. No external dependencies
  3. Thin wrapper around Java NIO
  4. No complex class hierarchy - just one File class
    • "Pimp my library" to add capabilities to File class
  5. Principal of Least Surprise - "Obvious" APIs
  6. Prevent unsupported operations at compile time
  7. ARM-ed (and not dangerous) by default
  8. Configurable; but sane defaults
  9. 100% test coverage
  10. Good docs
  11. Performant - as fast as plain Java*
  12. Principal of Least Power
    • Core is not reactive, not monadic, not effect-based nor pure
    • Upstream integrations (e.g., Akka watcher, Shapeless-scanner)

better-files

Scala I/O ... for humans

Signalling Intent via Literal Types

trait File {
  def append(text: String)(implicit codec: scala.io.Codec): File

  def moveTo(destination: File, overwrite: Boolean = false): File
}
trait File {self =>
  def append(text: String)(implicit codec: scala.io.Codec): self.type

  def moveTo(destination: File, overwrite: Boolean = false): destination.type
}
 (root/"tmp"/"diary.txt")
  .createIfNotExists()  
  .appendLine()
  .appendLines("My name is", "Inigo Montoya")
  .moveTo(home/"Documents")
  .renameTo("princess_diary.txt")
  .changeExtensionTo(".md")
  .lines

Automatic Resource Management

type Closeable = {
  def close(): Unit
}

type ManagedResource[A <: Closeable] = Traversable[A]

implicit class CloseableOps[A <: Closeable](resource: A) {
  def autoClosed: ManagedResource[A] = new Traversable[A] {
    override def foreach[U](f: A => U) = try {
      f(resource)
    } finally {
      resource.close()
    }
  }
}
for {
  in <- file1.newInputStream.autoClosed
  out <- file2.newOutputStream.autoClosed
} out.pipeTo(in)

TODO

  • Transactional I/O
    • Reusable file-handlers
 (root/"tmp"/"diary.txt")
  .createIfNotExists()  
  .appendLine()
  .appendLines("My name is", "Inigo Montoya")
  .moveTo(home/"Documents")
  .renameTo("princess_diary.txt")
  .changeExtensionTo(".md")
  .lines
  • Virtual files: Files vs. Paths
    • File.hasExtension
  • Typed Paths
    • absolute vs. relative
  • Scala 2.12 release

Scala Platform

  • "The standard library is where code goes to die"
    • Backwards compatibility
    • Graveyard of code frozen in time
  • "Batteries included"
    • Don't raise barrier to entry
    • Fragmentation vs "marketplace of ideas"
  • Split stdlib into two:
    1. Minimal compatible core needed to bootstrap scalac
    2. Decoupled "standard universe" (e.g., net, json, io, collections)
  • Not hard! Others have gotten this right before (e.g., node.js, Go)

Thank You

We are hiring (SF + NYC)!

  • Who are we:
  • What we do:
    • Quant Trading
    • Machine Learning
    • NLP
  • What we love:
    • Algorithms
    • Stats
    • Data Science

pbhowmick@coatue.com

$25,000 referral bonus

  •  Tech stack:
    • Scala
    • Spark @ Databricks
    • AWS Lambda
    • PostgreSQL
    • AWS Redshift
    • Apache Zeppelin
    • Apache Flink
    • Tableau
    • Python, R, Julia
    • Docker

better-files

By Pathikrit Bhowmick