Abstract: Since the outbreak of peer-to-peer (P2P) networking with Napster during the late ’90s, P2P applications have multiplied, become sophisticated and emerged as a significant fraction of Internet traffic. At first, P2P traffic was easily recognizable since P2P protocols used specific application TCP or UDP port numbers. However, current P2P applications have the ability to use arbitrary ports to “camouflage” their existence. Thus only a portion of P2P traffic is clearly identifiable. As a result, estimates and statistics regarding P2P traffic are unreliable. In this paper we present a characterization of P2P traffic in the Internet. We develop several heuristics that allow us to recognize P2P traffic at nonstandard ports. We find that depending on the protocol and metric used, approximately 30%-70% of traffic related to P2P applications cannot be identified using wellknown ports. In addition we present several characteristics for various P2P networks, such as eDonkey2000, Fasttrack, Gnutella, BitTorrent, Napster and Direct Connect, as seen in traffic samples from two Tier1 commercial backbones in 2002 and 2003.
Abstract: The use of peer-to-peer (P2P) applications is growing dramatically, particularly for sharing large video/audio files and software. In this paper, we analyze P2P traffic by measuring flowlevel information collected at multiple border routers across a large ISP network, and report our investigation of three popular P2P systems— FastTrack, Gnutella, and Direct-Connect. We characterize the P2P trafffic observed at a single ISP and its impact on the underlying network. We observe very skewed distribution in the traffic across the network at different levels of spatial aggregation (IP, prefix, AS). All three P2P systems exhibit significant dynamics at short time scale and particularly at the IP address level. Still, the fraction of P2P traffic contributed by each prefix is more stable than the corresponding distribution of eitherWeb traffic or overall traffic. The high volume and good stability properties of P2P traffic suggests that the P2P workload is a good candidate for being managed via application-specific layer-3 traffic engineering in an ISP’s network.
Comment: References DC as an example of a P2P system
Abstract: Designing user-friendly software interfaces enables more people to use the software. We believe that many freeware applications, such as filesharing programs, have an interface that is difficult to understand for a first-time user. Thus, many potential users are not able to use the software. In this paper, Direct Connect ++ (DC++), a filesharing application, was examined in terms of learnability, i.e. how easy it is to understand for a firsttime user. By observing real users and performing a Cognitive Walkthrough, a widely used web design evaluation tool, the authors identified a number of fundamental design flaws in DC++. These include poor feedback, lack of information and need for prior knowledge. A number of suggested improvements to remove some of these design flaws are presented in this paper. We believe that by improving the user interface of DC++, many more users would be able to use the program.
Comment: Review of DC++'s user interface elements and friendlyness
Abstract: Recent measurement studies report that a significant portion of Internet traffic is unknown. It is very likely that the majority of the unidentified traffic originates from peer-to-peer (P2P) applications. However, traditional techniques to identify P2P traffic seem to fail since these applications usually disguise their existence by using arbitrary ports. In addition to the identification of actual P2P traffic, the characteristics of that type of traffic are also scarcely known. The main purpose of this paper is twofold. First, we propose a novel identification method to reveal P2P traffic from traffic aggregation. Our method does not rely on packet payload so we avoid the difficulties arising from legal, privacy-related, financial and technical obstacles. Instead, our method is based on a set of heuristics derived from the robust properties of P2P traffic. We demonstrate our method with current traffic data obtained from one of the largest Internet providers in Hungary. We also show the high accuracy of the proposed algorithm by means of a validation study. Second, several results of a comprehensive traffic analysis study are reported in the paper. We show the daily behavior of P2P users compared to the non-P2P users. We present our important finding about the almost constant ratio of the P2P and total number of users. Flow sizes and holding times are also analyzed and results of a heavy-tail analysis are described. Finally, we discuss the popularity distribution properties of P2P applications. Our results show that the unique properties of P2P application traffic seem to fade away during aggregation and characteristics of the traffic.
Comment: Generic P2P traffic review, but references DC as a primary data flow system.
Abstract: The usage of peer-to-peer networks in massive distributed denial of service attacks is well known since the beginning of year 2007 when this kind of attack has often been observed against many public servers. This article discusses in great depth the anatomy of a DDoS attack generated using the DC++ network and shows some measures that could be used to defend against it, including a tool to detect the attacker hubs. The ideas presented in this article are based on practical experience during a confrontation with this type of attack which is still used with maximum of effectiveness against public servers.
Comment: Distributed Denial of Service (DDoS) evaluation
Abstract: The usage of peer-to-peer networks in massive distributed denial of service attacks is well known since the beginning of year 2007 when this kind of attack has often been observed  against many public servers. At the date of this article's writing (July 2008) there were not so frequent DC++ generated DDoS attacks reported. But the big danger still remains because a great number of the DC++ hubs around the world are owned by people whose ethics is questionable and who could (any time) generate such an attack. This article discusses in great depth the anatomy of a DC++ based DDoS attack and shows some measures that could be used to defend against it, including a tool to detect the attacker hubs. The ideas presented in this article are based on practical experience during a confrontation with this type of attack.
Comment: Same as other "DC++ and DDoS attacks" but a second version
Abstract: Online social networks and peer-to-peer file sharing networks create a digital mirror of human society, providing insights in social dynamics such as interaction between entities, structural patterns and flow of information. In the past such studies were inherently limited due to the vast supply of information. Today these phenomena can be studied at large scale using computers to process data from this digital mirror. Findings from such networks have shown interesting structural properties shared by both types of systems. In particular, it is often the case that they show to be scale-free and small-world networks. By letting ideas and findings from studied peer-to-peer networks guide the design of novel architectures, improvements on user integrity, usability and performance have been observed. This thesis presents a study of the Direct Connect peer-to-peer file sharing network. We model abstract tools and methods for measuring the network architecture, and, moreover, custom software tools for data gathering and analysis from Direct Connect networks are developed, presented and discussed. We look at network topology and properties, statistics on user activities and geographic distribution, characterization/statistics on data shared and correlations of users and their shared data. We verify the scale-free property, small-world network model, strong data redundancy with clusters of common interest in the set of shared content, high degree of asymmetry of connections and more. Finally, we discuss the implications of our findings and comparison with results from similar research is done.
Comment: Mathematical model of DC from a "spying perspective".
Abstract: This paper investigates the Direct Connect (DC) file sharing network, which to the best of our knowledge, has never been academically studied before. We developed a participating agent, in order to gather protocol specific information. We quantify network characteristics such as distribution of users in hubs, hubs geography, queries distribution and trends in shared folder size. We also characterize the typical DC user: A heavy downloader with a particularly large shared folder. Most importantly, we discovered a query duplications problem that drains much of the hubs CPU and bandwidth resources. In the DC network, query facilitation is the most demanding task for hubs and the main factor in the protocol’s scalability challenges. We show that in some hubs, up to a third of the queries traffic is duplicated and therefore wasteful. Resolving this problem will dramatically improve hubs performances by reducing the amount of relayed queries and thus permitting larger hub communities.
Abstract: Peer-to-peer file sharing communities form dynamic vaults of information and ideas that single users are not capable of aggregating on their own. Users willingly join and willingly share their resources on peer-to-peer networks. These types of networks are very prone to free riding, unless share limit restrictions are imposed. The free riding problem on peer-topeer networks and corresponding incentive mechanisms has been addressed in the literature in detailed mathematical models, business models, design objectives and case studies for various types of peer-to-peer networks. Direct Connect networks are very popular among relatively small file sharing communities of up to a few thousand users in size, due to their simplicity and ease of use. Direct Connect does not natively support any elaborate contribution incentive mechanisms. In this paper, we present a practical method to control free riding in Direct Connect networks by means of a virtual currency and provide feedback on the utility and effectiveness of our implementation.
Comment: Potential implementation of DC "currency"
Abstract: many file sharing systems exists today. The abundance of content on the web is in itself a huge problem to comprehend. The existence and wide scale use of peer-to-peer file sharing systems, which let users share files, folders, directors to entire drives makes the problem of finding relevant information a daunting task. With the increasing number of file sharing systems, there are challenges in providing users with useful recommendations about interesting products and services, which suit their tastes and behaviors. In this paper we propose a simple, extensible Peer-to-Peer (P2P) based content recommendation architecture.
Comment: Review of a content recommendation system where DC(++) is used for testing.
Abstract: This bachelor thesis deals with spam detection and data flow detection between clients of p2p exchange networks. It introduces the reader into the problems and deals with the possibilities for connection tracking by conntrack module on GNU/Linux system. It describes implementation features of the prototype program analyzer, which is implemented in C++ and in the future it will be used by the mid-sized ISP.
Abstract: In a distributed Peer-to-peer (P2P) system such as Direct Connect, files are often distributed over multiple source peers. It is up to the downloading peer to decide from how many and from which source peers to download the particular file of interest. Biased Random Period Switching (BRPS) is an algorithm, implemented at the downloading peer, that determines at what point to download from which source peer. The number of source peers that a downloading peer downloads from at a certain point is called the Degree of Parallelism (DoP). This research focused on implementing BRPS in an existing Direct Connect client and comparing the downloading performance against an unmodified client. Two implementations of BRPS in Direct Connect have been made. A simple implementation that follows the original BRPS algorithm as closely as possible, with minor modifications that were required to ensure that the downloading process would not get stuck on an unavailable source peer. An improved implementation has also been made with slight modifications to the original BRPS algorithm. The improved implementation incorporates two improvements to ensure that the DoP does not drop below its desired value in the face of unavailable source peers. The original client and the two BRPS implementations have been evaluated in a controlled Direct Connect network with 50 downloading peers and a variable number of source peers. The source peers have been configured to throttle their available bandwidth to an average of 500 KB/s, and following a realistic bandwidth distribution based on measurements from the Tor P2P network. The experiments consisted of all downloading peers downloading the same file at the same time, and taking measurements on the side of these downloading peers. Four experiments have been performed, with one varying parameter in each experiment. The size of the file being downloaded was varied between 100 MB and 1024 MB in the first experiment, the second experiment varied the DoP between 1 and 15. The number of source peers was varied between 10 and 100 in the third experiment, and in the last experiment between 0% and 80% unavailable source peers were added to the network. In all experiments, both BRPS implementations performed close to the optimal average download time, and were consistently faster than the original client by a factor of 2 to 5. In the last experiment, the improved BRPS implementation did keep the measured DoP closer to its desired value than the simple implementation, but this has not resulted in a significant difference in the measured download times.
Comment: A bunch of experiments showing that the current peer selection algorithm in ncdc isn't very good, and how to do it better.
Abstract: Among those cryptographic hash function which are not based on block ciphers, MD4 and Snefru seemed initially quite attractive for applications requiring fast software hashing. However collisions for Snefru were found in 1990, and recently a collision of MD4 was also found. This casts doubt on how long these functions' variants, such as RIPE-MD, MD5, SHA, SHA1 and Snefru-8, will remain unbroken. Furthermore, all these functions were designed for 32-bit processors, and cannot be implemented efficiently on the new generation of 64-bit processors such as the DEC Alpha. We therefore present a new hash function which we believe to be secure; it is designed to run quickly on 64-bit processors, without being too slow on existing machines.
Abstract: The Direct Connect peer to peer network has been steadily increasing in its popularity over the last 3 years. There are more than 20 different implementations of hubs and 30 flavours of clients. This work is a comprehensive end user study of the dynamics of the hub’s behaviour, the user’s communication and data sharing patterns, the loop holes in the system and protection mechanisms employed in such systems.