How I designed a Tor network

In this post, I discuss how I designed a proof-of-concept Tor network for my distributed computing course.

Introduction

Privacy can be argued as being the most crucial issue in the modern-day. With privacy, I do not have to worry about bad actors watching my every step. I can have peace of mind knowing that when I am browsing the internet, video calling with family and friends, or just looking at pictures of cats, my data is kept safe. Keeping this idea of privacy in mind, I set out to implement and experiment with the concept of onion routing. With onion routing, the idea is that when browsing the World Wide Web, no one can discern who the original recipient of each web request is. As a result, tracking users through their web requests is made difficult, and anonymity is achieved. This blog post will discuss how I implemented my proof-of-concept Tor network and my experiences during the development process.

Design

Preliminary

Before I set out and discuss each component of my Tor network, I must first concern myself with the main design goals that I have attempted to fulfill when building the Tor network.

The first design goal I wish to mention is portability. Portability is quite crucial because, through portability, I can make my design understandable and straightforward. I decided not to use a binary encoding because that would require further considerations concerning machine endianness. I settled on using an ASCII-based protocol whereby all packets are encoded using JSON (JavaScript Object Notation), and any binary data that cannot be encoded using JSON is encoded using Base64.

Finally, the second design goal was to keep my design as close to the original HTTP/1.0 (Hypertext Transfer Protocol) protocol as possible. Specifically, I wanted to ensure that my protocol follows the HTTP/1.0 flow: open a TCP connection, send a request, receive a response, and close the TCP connection. With this flow, I can integrate my Tor client and Tor nodes in a way that works seamlessly with any HTTP/1.0 endpoint.

With these goals established and understood, I can now go over all the components that make up my Tor network.

Figure 1. A diagram of the architecture of the Tor network, where N > 0.

Discovery

For my Tor network to function, I need a mechanism to connect Tor clients to Tor nodes, and I need to make sure this mechanism is located at a well-known location. With this in mind, I designed and implemented the Tor discovery server. The discovery component has two primary functions: keeping track of all Tor nodes in the network and informing Tor clients of all the Tor nodes in the Tor network.

Whenever a Tor node wishes to join the network, it connects to a well-known Tor discovery server and sends a REGISTER packet. The Tor node includes its host, port, and RSA public key in this packet. Upon receiving the packet, the Tor discovery server stores its contents in a ledger alongside the current Epoch timestamp. This timestamp is used to invalidate a Tor node's registration if it has not sent another REGISTER packet in over thirty seconds.

{
  "type": "REGISTER",
  "host": "127.0.0.1",
  "port": 12345,
  "publickey": "…"
}

Figure 2. An example REGISTER packet.

Whenever a Tor client wants to fetch a list of all the Tor nodes in the network, it connects to the well-known Tor discovery server and sends an empty LIST packet. The Tor discovery server responds with a LIST packet containing a list of all registered Tor nodes along with their respective RSA public keys. During this process, any invalid Tor nodes (nodes that have not replied recently) will be removed.

{
  "type": "LIST",
  "peers": []
}

Figure 3. An example LIST packet.

In the REGISTER packet, I mentioned that it must contain an RSA public key. I settled on using RSA public/private key encryption to fulfill my HTTP/1.0 flow design goal. This flow allows me to minimize the number of requests made. With RSA encryption, there are no extra handshakes involved. Once the client acquires the RSA public key, it can encrypt its contents without asking the key owner for more information. If I were to use a symmetric key approach, I would have needed to utilize the Diffie-Hellman key exchange, which would have resulted in multiple requests being made between the Tor nodes and clients to encrypt and send data, thus violating my design goal.

Nodes

In order to anonymize requests through my Tor network, I need Tor nodes. Like the previously mentioned component, my Tor nodes have a straightforward job: decrypt a request, forward the request's content to the destination, encrypt the response, and send the response back.

Whenever a Tor node receives a request, it is given an encrypted REQUEST packet containing the destination host, destination port, content to forward, and an RSA public key to encrypt the response. The request the Tor node receives is fully encrypted with the public key it announced to the well-known discovery server. As a result, the Tor node can use its corresponding private key to decrypt the encrypted request packet.

{
  "type": "REQUEST",
  "host": "127.0.0.1",
  "port": 12345,
  "content": "...",
  "publickey": "..."
}

Figure 4. An example REQUEST packet.

After decrypting the request packet, the Tor node connects to the destination host and port through TCP. Once fully connected, it simply forwards the content from the request packet to the destination. Once the Tor node receives a response from the destination, it uses the RSA public key supplied in the request packet to encrypt the response, send it back, and close the TCP connection. I decided to use this flow because it closely adheres to the HTTP/1.0 flow (thus adhering to my second design goal) and increases the system's simplicity at the cost of meaningful error handling.

Regarding message encryption, I used a hybrid encryption solution combining RSA and AES. To encrypt a message, I first generate a random AES key and initialization vector. Next, I use AES to encrypt the message. Finally, I use RSA to encrypt the AES key. When sending the encrypted message, I include the encrypted AES key and initialization vector in the first 272 bytes of the message. To decrypt a message, I use an RSA private key to decrypt the first 256 bytes of the encrypted AES key and then use the decrypted AES key to decrypt the remainder of the message.

Figure 5. Encrypted `content` structure.

Lastly, I want to mention that a public key is contained in the request packet. This public key belongs to a Tor client (the one that initiated the request chain). The public key is randomly generated and attached to the request packet whenever a Tor client wishes to make a request through the Tor network.

Clients

Now we arrive at the most crucial component of my Tor network, the client. Unlike the previously mentioned components, the client is quite complicated and can be considered the coordinator, combining all components to perform a single request.

To build our understanding of how the client works, let's go through an example request through the Tor network from start to finish. Let's suppose we have one well-known discovery server and three nodes named A, B, and C that are fully registered with the discovery component. And our client wishes to make an HTTP/1.0 request to the website google.com.

First, my client will generate and send an empty LIST packet to the discovery server. The discovery server will reply with a LIST packet containing all three nodes: A, B, and C. Each node's information will include its host, port, and public key.

The client is now responsible for constructing the path that its request will be routed through. For our example, let's say that the request will go through node A, then to node B, then to node C, and finally to google.com. To make this happen, my client will generate three REQUEST packets: one delivered to node A and forwarded to node B, the second delivered to node B and forwarded to node C, and the third delivered to node C, which will then forward the HTTP/1.0 request to google.com. Each REQUEST packet will have a random public RSA key attached to it.

Figure 6. Example "onion" request layers.

Before sending anything, encryption is crucial. In order to perform the onion routing part of my Tor network, I need to encrypt all packets in an onion-layered fashion. For our example, since we will be forwarding through node A, then B, and finally C, we need to encrypt starting with node C's REQUEST packet, then node B's, and finally node A's. So, we first encrypt C's packet with C's public RSA key and place the encrypted packet into the content field of B's packet. Then we encrypt B's packet with B's public RSA key and place it into the content field of A's packet. Finally, we encrypt A's packet with A's public RSA key, and now we send that encrypted packet directly from our client to node A. All encrypted packets are encoded using Base64.

Finally, I send the onion-encrypted packet to the first node, A. The request will propagate through the Tor network, and I will eventually receive a response. The response will also be an onion-encrypted packet because on the returning journey, Tor nodes re-encrypt the response using the randomly generated public RSA keys attached to each request packet. So, once I receive a response, I need to decrypt it. I use the corresponding randomly generated private key for each node's public key to decrypt the layers one by one.

Now, I have the plaintext response from google.com. I have completed the request piped through my Tor network.

By following this process, the client component acts as the coordinator, orchestrating the encryption, routing, and decryption of packets to achieve anonymity and privacy in the Tor network.

Conclusion

To conclude, I hope you found this post interesting. However, I want to emphasize that I do not endorse anyone adopting my design as a fully functional solution. It should be understood as a proof-of-concept and does not take into account the important design considerations that the official Tor project addresses — the cryptography portion of the project was merely exploratory.