Disclaimer: These notes are taken for the CS577 Data Privacy course offered by Dr. Erman Ayday in the 2021/2022 Fall Semester at Bilkent University.
Anonymous communication systems aim to achieve unlinkability between sent messages and their true recipients (recipient anonymity) and between the received messages and their true sender (sender anonymity). Anonymous communication systems also aim to provide relationship anonymity which means that an adversary observing the senders and recipients in the network is unable to discern who is communicating with whom. Designs for anonymous communication can often be classified into two general categories as follows.
– High-latency systems (message-based systems) are able to provide strong anonymity but are typically only applicable for noninteractive applications that can tolerate delays of several hours or more.
– Low-latency systems (connection-based systems) provide better performance and are intended for real-time applications.
High-latency Anonymity Systems
Mixes and Mix Networks
The basic idea of re-mailing can be applied in practice with mixes. A mix is a process that takes encrypted messages as input, groups several such messages into a batch and encrypts and forwards some or all messages in the batch. To make a more formal definition, a mix generates a pair of public and private keys and lets the public component be known by the network users. Clients encrypt their messages and the recipient address of these messages with the public key and sent them to the mix. The mix collects messages into a batch until it has received enough, and then forwards each of some or all to the destination address extracted from the decrypted input message with its private key.
If the sender knows the public key of the recipient, s/he can first encrypt the actual message with the recipient's public key, and then this ciphertext and the recipient address can be encrypted again with the public key of the mix. Then, an observer could not correlate the inputs to outputs as the messages are delayed, batched, and reordered. However, using a single mix could reveal the internal properties of the mix itself, i.e. the observer can determine the batch size, delay, or reordering by using a “tagging” attack. Also, if the mix itself is a passive internal adversary as explained in Table 1, it may choose to reveal the information to other parties. Therefore, senders may choose an ordered sequence of mixes. There are two general path selection strategies typically considered: free routes and mix cascades. With a free route topology, clients are able to choose any ordered sequence of mixes in the network for their message’s path. In a mix cascade topology, presented in Figure 1, there are one or more predefined routes through which all client traffic is relayed.
Figure 1: A cascade mix network
Flushing Algorithms
The algorithm that a mix uses to determine which messages will be forwarded to the next destination and when they will be forwarded is known as a flushing algorithm or a batching strategy.
Threshold Mixes: A threshold mix collects incoming encrypted messages until it has received n messages, where n is a parameter defined by the system or the mix operator. After receiving n messages, the mix then decrypts all of them and forwards them to their next destination in random order. Weakness → vulnerable to an active attack called the n − 1 attack, or flooding attack where the attacker sends his or her own “dummy messages” into the mix until it flushes
Timed Mixes: A timed mix collects incoming encrypted messages for a fixed length of time t, where t is defined by the system or the mix operator. After t seconds have elapsed, the mix then decrypts all of them and forwards them to their next destination in random order. Weakness → vulnerable to an active attack called the trickle attack in which the attacker lets a particular message of interest M enter the mix but prevents all other messages from entering the mix.
Threshold and/or Timed Mixes: This algorithm is a combination of times and threshold flushing algorithms. The mix collects messages until either it has received n messages or until t seconds have elapsed since the last time the mix was flushed, whichever occurs first. Weakness → vulnerable to combinations of flooding and trickle attacks
Pool Mixes: Pool mixes select a random subset of the collected messages to flush and then retain the rest in the mix for the next round. Weakness → Pool mixes do not eliminate the threat of (n1) and trickle attacks, collectively known as blending attacks, but they do increase the effort required on the part of the adversary.
Stop&Go Mixes: Instead of batching messages together, SG-mixes individually delay messages as they pass through the mix. SG-mixes calculate a time window [tmin, tmax]i and choose a random delay ti from an exponential distribution, for each mix i = 1...n. The time windows are computed such that (tmin + ti) ∈ [tmin, tmax]i+1 and also (tmax + ti) ∈ [tmin, tmax]i+1. The time window and random delay for mix i are included in the message encrypted with mix i’s public key. Weakness → If the current time tnow is outside the given time window, the mix discards the message.
Binomial Mixes: At the end of a mix round lasting t seconds, a binomial mix flips a biased coin with a forwarding probability pf = P (n) for each message currently contained in the mix, where n is the total number of messages in the mix and P (n) is the coin’s bias when the mix contains n messages. If the result of the coin flip is heads, then the message is forwarded to its next hop. Otherwise, the message is retained in the mix for the next round. Weakness → This algorithm does not eliminate the possibility of active blending attacks. However, it can increase the amount of time required for a successful attack, as well as require the adversary to spend more resources sending dummy messages into the mix.
Red-Green-Black (RGB) Mixes: These mixers try to detect when an attacker is conducting the attack, and minimize its negative impact on the anonymity of legitimate messages. The genuine client messages received during a mix round are considered black messages. Since messages are encrypted, it is impossible for a mix to tell what fraction of the received messages are black messages and which are messages generated by an adversary conducting a flooding attack. In order to estimate the number of legitimate messages received during a round, mixes send out “heartbeat” traffic, called red messages, with each output batch. Red traffic is indistinguishable from black traffic to an outside observer, and is “anonymously” addressed back to the mix itself. Since the red traffic is addressed to itself, the mix can monitor the fraction of red messages it receives compared to black traffic. If it detects a statistically significant drop in its received red traffic, the mix can deduce that it is currently under attack. When the mix detects an attack, it generates additional dummy messages, called green traffic, and sends them out with each batch to increase the anonymity of the black messages. RGB mixes mitigate (n − 1) attacks.
There have been a number of mix networks that are publicly available, such as anon.penet.fi, Cypherpunk Remailers, Mixmaster and Mixminion.
Low-latency Anonymity Systems
Flushing algorithms can introduce delays in the order of minutes to several hours or days. While these large delays are tolerable for some applications, real-time and interactive activations require low latency. Low-latency anonymity systems are often based on proxies. Aproxysimply forwards all incoming traffic immediately without any packet reordering. There are companies, such as anonymizer.com, that provides commercial proxy service. While this approach is simple and efficient, it requires clients to trust the company to not monitor or log their traffic.
In onion routing, there is a set of onion router servers. As in mix networks, each onion router holds a public-private key pair. Using asymmetric cryptography for every object requested would place a large computational cost on both initiators and servers in the network. To minimize the computational costs imposed, senders in an onion routing network only use public-key cryptography to establish the encrypted circuit and then use faster symmetric key cryptography to transfer the actual data. The Onion Router (Tor) is the most recent representative of onion routing. A small set of trusted authoritative directory servers are responsible for aggregating and distributing signed information about known routers in the network. The initiator of a circuit iteratively negotiates ephemeral session keys with each router in the circuit’s path using DH key negotiation and one-way RSA authentication. Once a key is established with the first hop, the initiator can go on and try to establish a key with the second hop via tunneling, and so on. This approach is known as the telescopic approach. When a circuit is not used anymore, the keys are discarded. It is useful as it prevents an adversary to recover the session key and decrypt traffic encrypted under that key, giving Tor clients perfect forward secrecy. Also, now the onions routers do not need to store the hashes of previously processed onions.
Figure 2: Onion routing in Tor
Systems for Unobservability
To achieve unobservability, an attacker monitoring users of a system must be unable to distinguish actual messages from random noise. The naive approach requires a system of n users with approximately synchronized clocks. Each user generates a public-private key pair. The public component is distributed to all other users via a channel. The parameter t indicates the length of a time interval. When t is finished, all users broadcast a fixed-length message. If user A wants to send a message to user B, A encrypts the message using B’s public key and then broadcasts it to all users. All users decrypt the ciphertext but only B could make sense of the message with its private key. If a user does not wish to send anything, it broadcasts random bytes of the same length. In this approach, users cannot communicate more than l/t bytes per second. As t gets smaller, the system becomes a constant broadcast channel which is impractical.
Dining Cryptographer Networks & Herbivore
A group of cryptographers is seated around a table having dinner at a restaurant. At the
conclusion of their meal, they are informed by the waiter that their bill has already been paid for anonymously. They speculate that their bill was either paid by one of the cryptographers seated at the table, or by the United States National Security Agency (NSA). The cryptographers respect each other’s right to have anonymously paid for the meal, but they are still curious if NSA paid for it instead.
DC-Network is a protocol that enables a cryptographer to let the other know if s/he pays the bill while remaining anonymous. Let all cryptographers be seated in a circle, just like dining philosophers. Each cryptographer secretly flips a coin and records a 0 if the result of the flip is tails and 1 if it is heads. Each then privately shares the result of his or her own flip with the cryptographer on his left. The cryptographer then XORs these two values and announces the result. If a cryptographer is paid for the dinner, s/he then keeps reporting z⊕1 instead of z. At the end of the full round, the cryptographers could jointly computeZ=z1⊕z2⊕...⊕zn which will be 0 if none of the cryptographers paid for the meal and 1 if someone paid. An example of this protocol for three cryptographers is given in Figure 3. Although DC-Networks provide theoretical anonymity, they are inefficient due to bandwidth overhead. It is also vulnerable to a message collision.
Figure 3: Dining cryptographer network for three
A novel contribution of the Herbivore system design is an anonymous slot-reservation protocol for collision avoidance in a DC-net. Communication in Herbivore takes place over a series of rounds, during which nodes reserve bandwidth on the channel and then transmit fixed blocks of data.
Traffic Analysis Attacks
Website Fingerprinting: If the adversary has a list of guesses as to which site the target user is visiting, the adversary can simply fetch those sites as well. He or she observes the quantity and length of data packets received in response. From this metadata, the adversary can build a fingerprint of what the Web site’s response traffic looks like when fetched via an encrypted connection. Solution → split messages into fixed-length cells. Messages shorter than the length of a cell are typically padded with random data.
Timing Attacks: A passive global adversary who is able to observe connections entering and exiting the anonymity network may be able to link inputs and outputs based on their patterns of packet inter-arrival times in low-latency systems.
Predecessor Attacks: A number of attackers join a crowd and attempt to identify the initiator of a particular persistent connection. When the paths are periodically reformed in a crowd, the persistent connection will have to be re-established. Over time, the true initiator of the connection is far more likely to be an adversarial node’s immediate predecessor in a path than any other node. The attack assumes the connection in question can be identified after each path reformation. Solution → a fully connected DC-net offers the best resilience against the predecessor attack.
Disclosure Attacks: A user, Alice, repeatedly sends messages to one of m different communication partners in each mix round. A passive adversary observes the messages entering and exiting the mix and wants to identify with whom Alice is corresponding. The adversary first waits until he has observed m mutually disjoint set of recipients. Each of these sets contains precisely one of Alice’s m communication partners. The adversary then refines these sets by waiting to observe a new recipient set that intersects with only one previously observed. The previous set is then reduced to the intersection set. The adversary can continue this attack until he has reduced all sets to only a single element, thus identifying each of Alice’s m communication partners. Solution → The basic disclosure attack is computationally difficult to perform due to the need to find mutually disjoint sets. Unfortunately, dummy traffic padding schemes effective against intersection attacks often induce significant bandwidth overhead.
[1] M. Edman and B. Yener, “On anonymity in an electronic society: A survey of anonymous communication systems,” ACM Computing Surveys (CSUR), vol. 42, no. 1, pp. 1–35, 2009.
Comments