Working with files
The term file is used to label the minimal digital entity that can be ‘persisted’. A file is a label for a bundle of data, that is to say, but what that bundle consists in (i.e., what the binary data represent) doesn’t really matter nor is it implied in the use of the term.
File definition
In this view, a ‘file’ is not the stuff stored, it is a way of bundling that stuff into identifiable materials that in turn can be managed, stored, retrieved, duplicated, preserved and so on, by the computer system and its component applications.
File definition
- placeless (PARC)
- hierarchical
File systems
- BSD
- NTFS
- ZFS
- FAT32
- ...
A file is a smallest unit in which the information is stored. Unix file system has several important features. All data in Unix is organized into files. All files are organized into directories. These directories are organized into a tree-like structure called the file system.
File definition
UNIX directory tree
- Text I/O
- Binary I/O
- Raw I/O
File objects
Even though IOBase does not declare read(), readinto(), or write() because their signatures will vary, implementations and clients should consider those methods part of the interface.
File descriptor must be closed
-
Vulnerability - process does not close sensitive file descriptors before invoking a child process, which allows the child to perform unauthorized I/O operations using those descriptors
- OS can run out of file descriptor
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
Line separators
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
Line separators
XML
eXtensible Markup Language
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<properties>
<java.version>1.8</java.version>
<jdk.version>8</jdk.version>
<kotlin.version>1.2.51</kotlin.version>
</properties>
<build>
<plugins>
<plugin>
<artifactId>kotlin-maven-plugin</artifactId>
<configuration>
<experimentalCoroutines>enable</experimentalCoroutines>
</configuration>
<groupId>org.jetbrains.kotlin</groupId>
<version>${kotlin.version}</version>
<executions>
<execution>
<id>compile</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
<configuration>
<sourceDirs>
<sourceDir>${project.basedir}/src/main/kotlin</sourceDir>
</sourceDirs>
<jvmTarget>${java.version}</jvmTarget>
</configuration>
</execution>
<execution>
<id>test-compile</id>
<phase>test-compile</phase>
<goals>
<goal>test-compile</goal>
</goals>
<configuration>
<sourceDirs>
<sourceDir>${project.basedir}/src/test/kotlin</sourceDir>
</sourceDirs>
</configuration>
</execution>
</executions>
</plugin>
<!-- Generate Javadoc and Source -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>${maven-source-plugin.version}</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.jetbrains.dokka</groupId>
<artifactId>dokka-maven-plugin</artifactId>
<version>${dokka.version}</version>
<executions>
<execution>
<phase>pre-site</phase>
<goals>
<goal>javadocJar</goal>
</goals>
</execution>
</executions>
<configuration>
<sourceDirectories>
<dir>src/main/kotlin</dir>
</sourceDirectories>
<jdkVersion>${jdk.version}</jdkVersion>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${maven-surefire-plugin.version}</version>
<dependencies>
<dependency>
<groupId>org.junit.platform</groupId>
<artifactId>junit-platform-surefire-provider</artifactId>
<version>${junit-platform.version}</version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
</project>
JSON
JavaScript Object Notation (spec)
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
YAML
Yet Another Markup Language (spec)
name: Mark McGwire
accomplishment: >
Mark set a major league
home run record in 1998.
stats: |
65 Home Runs
0.278 Batting Average
Working with files
By Aleksandr Slepchenkov
Working with files
- 167