File Identification and Why should I Care

Presented By: Michael Mann

What is Horatio Thinking?

That .doc file he uploaded was really a .cs file.  We need better protections in place so our servers can't be compromised and our users are safe.

What is File Format Identification

The process of figuring out the format of a sequence of bytes.

Operating systems typically do this by file extension or by embedded MIME information.

Forensic applications need to identify file types by content.

What does Beeline Do

Beeline leverages a combination of file extension, content length, mime-type verification, and virus scan.

This is not enough to protect ourselves and our clients from malicious attackers

Gentlemen, it's come to my attention that a breakaway Russian republic, Kreplachistan, is about to transfer a nuclear warhead to the United Nations in a few days. Here's the plan. We get the warhead, and we hold the world ransom for... $1,000,000.

Compromise Techniques

Attacker can upload a file that can change the security of the server i.e. .htaccess, .webconfig

mime-type can be formed by an attacker in a POST

Attacker can use the double extension technique to avoid black list protection

So can we combat this?

Text

White List 

mime type validation

File content verification

So What did we do

  • Wrote a C# library
    • No dependencies on CWS
    • In its own repository
    • Covered in Unit Tests
    • Consumes PRONOM Signature Files
  • Integrating library into CWS
  • Integrating library into API

Read Signature Files

Build object model

Read stream/ext

Find singnature

for ext

Compare byte stream to signature

If we fail a match we will provide a modal that communicates this to the end user, and we will send out a notification.  This approach is what we do for Virus-scan failures

Anatomy of a signature file

Signature files are XML files.  There are two kinds of signature files namely non-container format and container format

Excited for a DEMO

References

File Identification and why you should care

By mmann2943

File Identification and why you should care

  • 128