On Sunday
I thought I would try to extract images from a TCP Flow with
scapy
. Initial searches should
how to do this with wireshark
, but the
idea is to do this programatically and try to learn something about scapy. At
the
hackerspace
tonight, not wanting to work on anything I said I would, I
thought I would have a play with scapy.
A little bit on TCP
A TCP Flow is how we refer to the stream of packets that make what you might
call TCP Connection. The connection bit is just the start. A Flow is defined in
IP by 5 numbers:
* protocol (TCP or UDP for the most part)
* source
- address (ip address of the initiator of the connection
- port (normally a randomly chosen emphemeral port number)
* destination
- address (ip address of the host)
- port (normally a well known service number, http is 80)
A lot of time we call this the 5 tuple, or 4 tuple if we know the protocol. At
any one point in a time a given 5-tuple defines the connection(Flow).
To a programmer TCP presents a reliable byte orientated stream interface. This
means any bytes we write into our TCP socket, will come out of the other end in
order and they are guaranteed to arrive (or an error is generated).
Data written into a TCP socket is broken into chunks the network can support
(normally, without fragmentation), we call these chunks of data segments. Each
segment has a sequence number, which tells the remote end where it is in the
stream, there can be a large number of these segments in the air at a time, the
flight size.
Segments can get lost in the network (well dropped by routers), reordered or
delayed.
Extracting Images
To extract images from a network capture we need to separate out the packets
into flows; reassemble TCP flows into a byte stream taking into account loss
and reordering; reconstruct the segments into a coherent byte stream; search the
byte stream for image headers and try to extract them.
This is a none trivial amount of work for a Tuesday night.
Before writing any code I did some searching, scapy might have support for flow
reconstruction(nope). I came across some references to a tool called
tcpflow
,
tcpflow
claims to be a tool for extracting TCP Flows from either a live
capture interface or a pcap file.
That looked great to me, I would grab a pcap with
tcpdump
, process out the
flows with
tcpflow
and then drop that into scapy and start looking for some
images.
Reading the
tcpflow
man page I instead found a single option that would do
all the work for me.
Images with
tcpflow
It is really easy to extract images from a http TCP Flow using
tcpflow
, you
can do this live, but I used a pcap file.
# tcpdump -w webimage.pcap host adventurist.me and port 80
I started up the dump then visited
http://adventurist.me/posts/0129
in
FireFox's porn mode.
tcpflow
will read in a pcap file with the -r flag, the -e flag will apply
magic to the flow and find you fun stuff.
$ tcpflow -r webimage.pcap -e http
tcpflow
will spit out a file for each flow, to boot it will throw in
extracted data for everything it understands.
$ ls
093.095.228.091.00080-172.031.005.168.58914
093.095.228.091.00080-172.031.005.168.58914-HTTPBODY-001.jpg
093.095.228.091.00080-172.031.005.168.58914-HTTPBODY-002.ico
093.095.228.091.00080-172.031.005.168.60028
093.095.228.091.00080-172.031.005.168.60028-HTTPBODY-001.html
093.095.228.091.00080-172.031.005.168.60028-HTTPBODY-002.css
172.031.005.168.58914-093.095.228.091.00080
172.031.005.168.60028-093.095.228.091.00080
report.xml
webimage.pcap
tcpflow
also seems to be spitting out a
report.xml
, it seems to describe
what it has just done. I image that is super useful when running
tcpflow
against a live capture. I haven't managed to get very far using scapy to pull
images out of flows, I am starting to wonder if there is really any point when
all these tools are available.