Wednesday, November 24, 2010

Antaramuka Pengaturcaraan Aplikasi untuk VirusTotal

Virustotal telah menjadi salah sebuah tempat rujukan yang sangat berguna dalam memastikan sesebuah fail itu berbahaya atau tidak. Jika dilihat dari sisi hadapan, virustotal telah mengumpulkan antivirus-antivirus yang terkenal sebagai enjin untuk memberitahu tentang status sesebuah fail yang ingin dikesan. Ini ketara keberkesanannya dari sudut keutuhan sesebuah keputusan, yang mana, rujukan silang (cross-reference) diantara kesemua antivirus-antivirus dapat dilihat di dalam virustotal, seterusnya dapat mengurangkan kadar kesilapan dalam proses pengesanan.

Untuk menambahbaik lagi skop kemampuan dan fungsi virustotal, virustotal telah ditambah dengan fungsi antaramuka pengaturcaraan aplikasi virustotal, atau VirusTotal API. Dengan menggunakan Virustotal API, kita boleh memuatnaik dan mengesan fail serta URL, atau mengakses laporan fail yang telah dimuatnaik sebelum ini, tanpa melalui laman web utama virustotal. Ia boleh dilakukan dengan melaksanakan permintaan HTTP Post ke URL tertentu di virustotal. Untuk maklumat lanjut mengenai gerak kerja VirusTotal API serta cara-cara perlaksanaannya, sila klik disini.

Saya juga tidak terkecuali dalam memanfaatkan kegunaan VirusTotal API dalam beberapa projek yang saya jalankan. Berikut adalah implimentasi ringkas menggunakan bahasa pengaturcaraan ruby untuk mengakses laporan yang sudah sedia ada di pengkalan data Virustotal berdasarkan MD5 hash yang disediakan oleh kita ;


#!/usr/bin/ruby
require 'net/https'
require 'uri'
require 'digest/md5'
require 'rubygems'
require 'json'

def virustotal(file)
md5 = Digest::MD5.hexdigest(File.read(file))
uri = URI.parse("https://www.virustotal.com/api/get_file_report.json")
key = 'LETAK-VIRUSTOTAL-API-KEY-ANDA-DISINI'

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Post.new(uri.request_uri)
request.set_form_data({'resource' => md5, 'key' => key})
response = http.request(request)

get_file_report = JSON.parse(response.body)
result = get_file_report['report']

puts "Date submitted: " + result[0]

result[1].each do |av,res|
if res.empty? == false
print "#{av.rjust(14)}: #{res}\n"
end
end
end

if ARGV.length == 1
virustotal(ARGV[0])
else
puts "Usage: #{__FILE__} file"
end

Wednesday, November 3, 2010

No endstream, no endobj, no worries

In analyzing malicious PDF documents, being able to understand the format of its object structure is definitely useful. In order to look for malicious content inside the file, we might need to go through some of the process that’ll include interpreting the PDF object structure.

The PDF object is enclosed with “obj” and “endobj”. Between the “obj” and “endobj” there are usually 2 components, object dictionary and stream. Object dictionary are represented by keys and values that enclosed with “<<” and “>>”, while stream is a sequence of bytes. A stream shall consist of zero or more bytes bracketed between the keywords stream (followed by newline) and endstream.

The below snippet reflects the normal PDF object structure;

obj 1 0
<< /Length 12 >>
stream
HELLO WORLD!
endstream
endobj

The obj 1 0 contains the dictionary (in between << and >>) of /Length (key) with value of 12. Below the dictionary, the stream exist with string “HELLO WORLD!” just before the endstream. Finally, thehe object structure is closed with endobj tag which indicate the end of object 1 0′s portion.

Although the PDF object structure is rather easy to understand, these structure can also be easily manipulated in many ways for malicious intent. The main reason of manipulation purpose is to break the analysis process particularly for PDF analysis tools. How can the PDF object structure be manipulated? Usually attackers omit some syntax or tags required within the object. This omission, however, seems to be considered as valid structure by PDF reader such as Adobe Reader. For example:

Object without “endobj” 
obj 1 0
<< /Length 1337 >>
stream
HELLO WORLD!
endstream

Object without “endstream” 
obj 1 0
<< /Length 1337 >>
stream
HELLO WORLD!
endobj

So-called bluff trick 
obj 1 0
<< /Length 1337 >>
stream
HELLO WORLD!endstream\n
endstream
endobj

In the 3 examples above, we can see that even when some components are dropped (or added) from/to the structure and the PDF reader can still render the text without generating any error.

In the last snippet, we can see the use the bluff trick to confuse the security tools in getting the right portion of stream. When pattern matching technique is used, the script/tool might not get the complete stream content since it got confused between the first and the second endstream. A proper handling of these manipulation should be considered thoroughly in order to get a reliable extraction.

Generalizing the security tools seems to be a crucial task in order for it to work in any conditions encountered. Pattern matching technique alone will not work. Understanding the format within the PDF object helps a lot in the process of generalizing the analysis tools.

For example, in a normal manipulation method, attackers cannot get rid of the “endstream” and “endobj”‘s tag simultaneously. Instead, either “endstream” or “endobj” or both will exist. From our rough solution, a regular expression like />>.*?stream(.*?)(endstream|endobj)/m can be reliably implemented with aid of other filtering mechanism.