On Sunday I thought I would try to extract images from a TCP Flow with scapy. Initial searches should how to do this with wireshark, but the idea is to do this programatically and try to learn something about scapy. At the hackerspace tonight, not wanting to work on anything I said I would, I thought I would have a play with scapy.
A little bit on TCP
A TCP Flow is how we refer to the stream of packets that make what you might call TCP Connection. The connection bit is just the start. A Flow is defined in IP by 5 numbers:
* protocol (TCP or UDP for the most part) * source - address (ip address of the initiator of the connection - port (normally a randomly chosen emphemeral port number) * destination - address (ip address of the host) - port (normally a well known service number, http is 80)
A lot of time we call this the 5 tuple, or 4 tuple if we know the protocol. At any one point in a time a given 5-tuple defines the connection(Flow).
To a programmer TCP presents a reliable byte orientated stream interface. This means any bytes we write into our TCP socket, will come out of the other end in order and they are guaranteed to arrive (or an error is generated).
Data written into a TCP socket is broken into chunks the network can support (normally, without fragmentation), we call these chunks of data segments. Each segment has a sequence number, which tells the remote end where it is in the stream, there can be a large number of these segments in the air at a time, the flight size.
Segments can get lost in the network (well dropped by routers), reordered or delayed.
To extract images from a network capture we need to separate out the packets into flows; reassemble TCP flows into a byte stream taking into account loss and reordering; reconstruct the segments into a coherent byte stream; search the byte stream for image headers and try to extract them.
This is a none trivial amount of work for a Tuesday night.
Before writing any code I did some searching, scapy might have support for flow
reconstruction(nope). I came across some references to a tool called
tcpflow claims to be a tool for extracting TCP Flows from either a live
capture interface or a pcap file.
That looked great to me, I would grab a pcap with
tcpdump, process out the
tcpflow and then drop that into scapy and start looking for some
tcpflow man page I instead found a single option that would do
all the work for me.
It is really easy to extract images from a http TCP Flow using
can do this live, but I used a pcap file.
# tcpdump -w webimage.pcap host adventurist.me and port 80
I started up the dump then visited http://adventurist.me/posts/0129 in FireFox's porn mode.
tcpflow will read in a pcap file with the -r flag, the -e flag will apply
magic to the flow and find you fun stuff.
$ tcpflow -r webimage.pcap -e http
tcpflow will spit out a file for each flow, to boot it will throw in
extracted data for everything it understands.
$ ls 093.095.228.091.00080-172.031.005.168.58914 093.095.228.091.00080-172.031.005.168.58914-HTTPBODY-001.jpg 093.095.228.091.00080-172.031.005.168.58914-HTTPBODY-002.ico 093.095.228.091.00080-172.031.005.168.60028 093.095.228.091.00080-172.031.005.168.60028-HTTPBODY-001.html 093.095.228.091.00080-172.031.005.168.60028-HTTPBODY-002.css 172.031.005.168.58914-093.095.228.091.00080 172.031.005.168.60028-093.095.228.091.00080 report.xml webimage.pcap
tcpflow also seems to be spitting out a
report.xml, it seems to describe
what it has just done. I image that is super useful when running
against a live capture. I haven't managed to get very far using scapy to pull
images out of flows, I am starting to wonder if there is really any point when
all these tools are available.