File Identification and Why should I Care
Presented By: Michael Mann
What is Horatio Thinking?
That .doc file he uploaded was really a .cs file. We need better protections in place so our servers can't be compromised and our users are safe.
What is File Format Identification
The process of figuring out the format of a sequence of bytes.
Operating systems typically do this by file extension or by embedded MIME information.
Forensic applications need to identify file types by content.
What does Beeline Do
Beeline leverages a combination of file extension, content length, mime-type verification, and virus scan.
This is not enough to protect ourselves and our clients from malicious attackers
Gentlemen, it's come to my attention that a breakaway Russian republic, Kreplachistan, is about to transfer a nuclear warhead to the United Nations in a few days. Here's the plan. We get the warhead, and we hold the world ransom for... $1,000,000.
Compromise Techniques
Attacker can upload a file that can change the security of the server i.e. .htaccess, .webconfig
mime-type can be formed by an attacker in a POST
Attacker can use the double extension technique to avoid black list protection
So can we combat this?
Text
White List
mime type validation
File content verification
- Open source JAVA project
- Leverages PRONOM
So What did we do
- Wrote a C# library
- No dependencies on CWS
- In its own repository
- Covered in Unit Tests
- Consumes PRONOM Signature Files
- Integrating library into CWS
- Integrating library into API
Read Signature Files
Build object model
Read stream/ext
Find singnature
for ext
Compare byte stream to signature
If we fail a match we will provide a modal that communicates this to the end user, and we will send out a notification. This approach is what we do for Virus-scan failures
Anatomy of a signature file
Signature files are XML files. There are two kinds of signature files namely non-container format and container format
Excited for a DEMO
References
File Identification and why you should care
By mmann2943
File Identification and why you should care
- 128