Extract Images from a TCP Flow

On Sunday I thought I would try to extract images from a TCP Flow with scapy. Initial searches should how to do this with wireshark, but the idea is to do this programatically and try to learn something about scapy. At the hackerspace tonight, not wanting to work on anything I said I would, I thought I would have a play with scapy.

A little bit on TCP

A TCP Flow is how we refer to the stream of packets that make what you might call TCP Connection. The connection bit is just the start. A Flow is defined in IP by 5 numbers:

* protocol (TCP or UDP for the most part)
* source
    - address (ip address of the initiator of the connection
    - port (normally a randomly chosen emphemeral port number)
* destination
    - address (ip address of the host)
    - port (normally a well known service number, http is 80)

A lot of time we call this the 5 tuple, or 4 tuple if we know the protocol. At any one point in a time a given 5-tuple defines the connection(Flow).

To a programmer TCP presents a reliable byte orientated stream interface. This means any bytes we write into our TCP socket, will come out of the other end in order and they are guaranteed to arrive (or an error is generated).

Data written into a TCP socket is broken into chunks the network can support (normally, without fragmentation), we call these chunks of data segments. Each segment has a sequence number, which tells the remote end where it is in the stream, there can be a large number of these segments in the air at a time, the flight size.

Segments can get lost in the network (well dropped by routers), reordered or delayed.

Extracting Images

To extract images from a network capture we need to separate out the packets into flows; reassemble TCP flows into a byte stream taking into account loss and reordering; reconstruct the segments into a coherent byte stream; search the byte stream for image headers and try to extract them.

This is a none trivial amount of work for a Tuesday night.

Before writing any code I did some searching, scapy might have support for flow reconstruction(nope). I came across some references to a tool called tcpflow, tcpflow claims to be a tool for extracting TCP Flows from either a live capture interface or a pcap file.

That looked great to me, I would grab a pcap with tcpdump, process out the flows with tcpflow and then drop that into scapy and start looking for some images.

Reading the tcpflow man page I instead found a single option that would do all the work for me.

Images with tcpflow

It is really easy to extract images from a http TCP Flow using tcpflow, you can do this live, but I used a pcap file.

# tcpdump -w webimage.pcap host adventurist.me and port 80

I started up the dump then visited http://adventurist.me/posts/0129 in FireFox's porn mode.

tcpflow will read in a pcap file with the -r flag, the -e flag will apply magic to the flow and find you fun stuff.

$ tcpflow -r webimage.pcap -e http

tcpflow will spit out a file for each flow, to boot it will throw in extracted data for everything it understands.

$ ls

tcpflow also seems to be spitting out a report.xml, it seems to describe what it has just done. I image that is super useful when running tcpflow against a live capture. I haven't managed to get very far using scapy to pull images out of flows, I am starting to wonder if there is really any point when all these tools are available.