better-files
Towards simple, safe, sane I/O in Scala
Problems
- Read a string from a file?
- Append lines to a file?
- Rename a file?
- Move a directory?
- Delete a directory?
- Get the md5 of a file?
- Glob a directory?
- Check file permissions?
- Follow symlinks?
- chmod/chown?
- zip/unzip?
- Size of a directory?
- Slurp/write bytes?
Fail
Appending to a file
import java.io.{File, BufferedWriter, FileOutputStream, FileWriter, OutputStreamWriter}
import scala.io.Codec
def append(file: File, text: String)(implicit codec: Codec): Unit = {
val fout = new FileOutputStream(file, true)
val writer = new BufferedWriter(new OutputStreamWriter(fout, codec.charSet))
try {
writer.append(text)
} finally {
fout.close()
writer.close()
}
}
- What if file does not exist?
- What if it is a directory?
- What if I lack write permissions?
- What if it has a write lock?
- What if it is a symlink?
Copying Files
import java.nio.file.{Files, Path}
def copy(source: Path, destination: Path): Unit = {
Files.copy(source, destination)
}
"Directories can be copied. However, files inside the directory are not copied, so the new directory is empty even when the original directory contains files."
https://docs.oracle.com/javase/tutorial/essential/io/copy.html
Copying Files Directories
import java.nio.file.attribute.BasicFileAttributes
import java.nio.file.{Files, Path, SimpleFileVisitor}
/**
* Scala port of https://docs.oracle.com/javase/tutorial/essential/io/examples/Copy.java
*/
def copy(source: Path, destination: Path): Unit = {
if(Files.isDirectory(source)) {
Files.walkFileTree(source, new SimpleFileVisitor[Path] {
def newPath(subPath: Path): Path = destination resolve (source relativize subPath)
override def preVisitDirectory(dir: Path, attrs: BasicFileAttributes) = {
Files.createDirectories(newPath(dir))
super.preVisitDirectory(dir, attrs)
}
override def visitFile(file: Path, attrs: BasicFileAttributes) = {
Files.copy(file, newPath(file))
super.visitFile(file, attrs)
}
})
} else {
Files.copy(source, destination)
}
}
- What if destination exists?
- Symlink handling? Follow vs no-follow?
-
What if source and destination on different filesystems?
Java Solutions
-
Apache Commons IO / Google Guava
- Works! But, it's built for Java:
-
FileUtil.read(file) vs file.read()
-
scanner.nextInt() vs scanner.next[Int]
- Plain Java NIO
-
java.nio.Files.lines(myFile).count
-
import java.nio.file.{Files, Path}
def chown(file: Path, owner: String): Unit = {
Files.setOwner(file,
file.getFileSystem
.getUserPrincipalLookupService
.lookupPrincipalByName(owner)
)
}
Scala Solutions
- Scala standard library (scala.io.Source)
- Doesn't solve most problems
-
scala-io
- Does too much and yet too little
- Last commit on May 2012
-
ammonite-ops
-
rm! cwd/'target/'folder/'thing
-
scala.io.Codec, java.nio.file.LinkOption
-
- Roll your own
- google.com/search?q=IOUtil.scala
- DRY < DROP (Don't Repeat Other People)
-
better-files
- A terrible name for a library
Scala - A Melting Pot of Ideas
"A drunken Martin Odersky sees a Reese's Peanut Butter Cup ad featuring somebody's peanut butter getting on somebody else's chocolate and has an idea. He creates Scala, a language that unifies constructs from both object oriented and functional languages. This pisses off both groups and each promptly declares jihad."
~James Iry (2009)
(A Brief, Incomplete, and Mostly Wrong History of Programming Languages)
Example: github.com/scala/slip/issues/19
Completely Opinionated Design Goals
-
One library to rule them all - all utils you will ever need:
- Google Guava + Apache Commons IO + Jodd + Java NIO utils
- No external dependencies
- Thin wrapper around Java NIO
-
No complex class hierarchy - just one File class
- "Pimp my library" to add capabilities to File class
- Principal of Least Surprise - "Obvious" APIs
- Prevent unsupported operations at compile time
- ARM-ed (and not dangerous) by default
- Configurable; but sane defaults
- 100% test coverage
- Good docs
- Performant - as fast as plain Java*
-
Principal of Least Power
- Core is not reactive, not monadic, not effect-based nor pure
- Upstream integrations (e.g., Akka watcher, Shapeless-scanner)
better-files
Scala I/O ... for humans
Signalling Intent via Literal Types
trait File {
def append(text: String)(implicit codec: scala.io.Codec): File
def moveTo(destination: File, overwrite: Boolean = false): File
}
trait File {self =>
def append(text: String)(implicit codec: scala.io.Codec): self.type
def moveTo(destination: File, overwrite: Boolean = false): destination.type
}
(root/"tmp"/"diary.txt")
.createIfNotExists()
.appendLine()
.appendLines("My name is", "Inigo Montoya")
.moveTo(home/"Documents")
.renameTo("princess_diary.txt")
.changeExtensionTo(".md")
.lines
Automatic Resource Management
type Closeable = {
def close(): Unit
}
type ManagedResource[A <: Closeable] = Traversable[A]
implicit class CloseableOps[A <: Closeable](resource: A) {
def autoClosed: ManagedResource[A] = new Traversable[A] {
override def foreach[U](f: A => U) = try {
f(resource)
} finally {
resource.close()
}
}
}
for {
in <- file1.newInputStream.autoClosed
out <- file2.newOutputStream.autoClosed
} out.pipeTo(in)
TODO
- Transactional I/O
- Reusable file-handlers
(root/"tmp"/"diary.txt")
.createIfNotExists()
.appendLine()
.appendLines("My name is", "Inigo Montoya")
.moveTo(home/"Documents")
.renameTo("princess_diary.txt")
.changeExtensionTo(".md")
.lines
-
Virtual files: Files vs. Paths
-
File.hasExtension
-
- Typed Paths
- absolute vs. relative
- Scala 2.12 release
Scala Platform
-
"The standard library is where code goes to die"
- Backwards compatibility
- Graveyard of code frozen in time
- "Batteries included"
- Don't raise barrier to entry
- Fragmentation vs "marketplace of ideas"
-
Split stdlib into two:
- Minimal compatible core needed to bootstrap scalac
- Decoupled "standard universe" (e.g., net, json, io, collections)
- Not hard! Others have gotten this right before (e.g., node.js, Go)
Thank You
- GitHub: github.com/pathikrit/better-files
- Gitter: gitter.im/pathikrit/better-files
- ScalaDoc: pathikrit.github.io/better-files/latest/api/
- MIT License
- 18 open feature requests
- 2 open bugs
- ~450 LOC
- Tests/Benchmarks: 2000+ LOC
- 1000+ Maven downloads per month
- PRs Welcome!
We are hiring (SF + NYC)!
- Who are we:
- What we do:
- Quant Trading
- Machine Learning
- NLP
- What we love:
- Algorithms
- Stats
- Data Science
$25,000 referral bonus
- Tech stack:
- Scala
- Spark @ Databricks
- AWS Lambda
- PostgreSQL
- AWS Redshift
- Apache Zeppelin
- Apache Flink
- Tableau
- Python, R, Julia
- Docker
better-files
By Pathikrit Bhowmick