Extract Images from a TCP Flow

On Sunday I thought I would try to extract images from a TCP Flow with scapy . Initial searches should how to do this with wireshark , but the idea is to do this programatically and try to learn something about scapy. At the hackerspace tonight, not wanting to work on anything I said I would, I thought I would have a play with scapy.

A little bit on TCP

A TCP Flow is how we refer to the stream of packets that make what you might call TCP Connection. The connection bit is just the start. A Flow is defined in IP by 5 numbers:

* protocol (TCP or UDP for the most part)
* source
    - address (ip address of the initiator of the connection
    - port (normally a randomly chosen emphemeral port number)
* destination
    - address (ip address of the host)
    - port (normally a well known service number, http is 80)

A lot of time we call this the 5 tuple, or 4 tuple if we know the protocol. At any one point in a time a given 5-tuple defines the connection(Flow).

To a programmer TCP presents a reliable byte orientated stream interface. This means any bytes we write into our TCP socket, will come out of the other end in order and they are guaranteed to arrive (or an error is generated).

Data written into a TCP socket is broken into chunks the network can support (normally, without fragmentation), we call these chunks of data segments. Each segment has a sequence number, which tells the remote end where it is in the stream, there can be a large number of these segments in the air at a time, the flight size.

Segments can get lost in the network (well dropped by routers), reordered or delayed.

Extracting Images

To extract images from a network capture we need to separate out the packets into flows; reassemble TCP flows into a byte stream taking into account loss and reordering; reconstruct the segments into a coherent byte stream; search the byte stream for image headers and try to extract them.

This is a none trivial amount of work for a Tuesday night.

Before writing any code I did some searching, scapy might have support for flow reconstruction(nope). I came across some references to a tool called tcpflow , tcpflow claims to be a tool for extracting TCP Flows from either a live capture interface or a pcap file.

That looked great to me, I would grab a pcap with tcpdump , process out the flows with tcpflow and then drop that into scapy and start looking for some images.

Reading the tcpflow man page I instead found a single option that would do all the work for me.

Images with tcpflow

It is really easy to extract images from a http TCP Flow using tcpflow , you can do this live, but I used a pcap file.

# tcpdump -w webimage.pcap host adventurist.me and port 80

I started up the dump then visited http://adventurist.me/posts/0129 in FireFox's porn mode.

tcpflow will read in a pcap file with the -r flag, the -e flag will apply magic to the flow and find you fun stuff.

$ tcpflow -r webimage.pcap -e http

tcpflow will spit out a file for each flow, to boot it will throw in extracted data for everything it understands.

$ ls
093.095.228.091.00080-172.031.005.168.58914
093.095.228.091.00080-172.031.005.168.58914-HTTPBODY-001.jpg
093.095.228.091.00080-172.031.005.168.58914-HTTPBODY-002.ico
093.095.228.091.00080-172.031.005.168.60028
093.095.228.091.00080-172.031.005.168.60028-HTTPBODY-001.html
093.095.228.091.00080-172.031.005.168.60028-HTTPBODY-002.css
172.031.005.168.58914-093.095.228.091.00080
172.031.005.168.60028-093.095.228.091.00080
report.xml
webimage.pcap

tcpflow also seems to be spitting out a report.xml , it seems to describe what it has just done. I image that is super useful when running tcpflow against a live capture. I haven't managed to get very far using scapy to pull images out of flows, I am starting to wonder if there is really any point when all these tools are available.

Spooky Art-Net Pumpkins

Last night was All Hallows' Eve , I wanted to do something cool with the decorations. I repurposed an rgb neopixel board driven by a nodemcu board and gave one of our pumpkins a network controlled candle instead of the old analog kind.

I also spent some time building out a motion sensor, but I wasn't able to integrate that with the network code in time to use it. In the end the weather seems to have kept everyone at home and we didn't have any visitors.

I am going to try and get everything together tonight at the hackerspace , if I do I will write up what all the parts are.


Reading: Abaddon's Gate

Getting Images Out of Wireshark

While researching extracting images with scapy I found a page describing image extraction with Wireshark, I am not sure why I didn't think to try this first. Of course Wireshark can do this super useful network task, their mission is to make the ultimate network diagnostic tool.

The information on that page seems to be a little out of date, on my Wireshark build the PDU tracing and http follow options were already selected.

Grab a dump of a http session, then feed it into Wireshark:

# tcpdump -w webimage.pcap host adventurist.me and port 80

I visited this page which I know has an image on it in FireFox's porn mode.

http.response.code==200

In Wireshark I used a http 200 response code to find all of the assets in the stream. This left only three items, the page itself, the css style sheet and the image. Expand out the TCP block in Wireshark, right click on the JPEG block and choose 'Export Packet Bytes'. I saved this as .bin, moved it to a .jpeg and was able to open the image.


Reading: Abaddon's Gate

Getting Certs Out of Wireshark

Packet capture tools are oscilloscopes to network programmers, I couldn't get anything done without near continual use of tcpdump and wireshark . In a pinch tcpdump can be used instead of writing server code .

Wireshark has support for a load of protocols and can really help with debugging. Recently I added dtls support to NEAT . DTLS is a protocol enhancement to TLS to support datagram traffic, when it is working all of the traffic is basically random noise.

I had trouble gettting server certs to work correctly with DTLS, thankfully Wireshark can reassemble the datagrams into a coherent certificate and export the data out to a file. I can use this to manually check the cert is being sent correctly.

The process is something like this:

1. Import pcap
2. Find the full reassembled server hello
3. Expand the DTLS body
4. Expand the DTLS Record, Certificate (Reassembled)
5. Right click on 'Handshake Protocol: Certificate(Reassembled)' 
6. Select Export Packet Bytes

After than I had a TLS Cert in DER format , DER is just he raw cert bytes. With this I could then verify using openssl that the cert chain was valid.


Reading: Abaddon's Gate

Coffee routine

Went to a friends and carved some pumpkins last night, that means I didn't manage to do anything interesting yesterday. Weekends are when I make coffee , Sunday is filtering day which looks something like this:

I have to run out to meet someone for lunch, tonight I am going to have a play with Scapy. I think I will try to pull an image out of a http stream, that seems like a small enough task to be doable.


It is Sunday, so that makes seven days of writing .

Reading: Abaddon's' Gate