Distribution of biological databases over lowbandwidth networks

Databases are integral part of bioinformatics and need to be accessed most frequently, thus downloading and updating them on a regular basis is very critical. The establishment of bioinformatics research facility is a challenge for developing countries as they suffer from inherent low-bandwidth and unreliable internet connections. Therefore, the identification of techniques supporting download and automatic synchronization of large biological database at low bandwidth is of utmost importance. In current study, two protocols (FTP and Bit Torrent) were evaluated and the utility of a BitTorren based peer-to-peer (btP2P) file distribution model for automatic synchronization and distribution of large dataset at our facility in Pakistan have been discussed.


Background:
During last couple of years, scientific community among developing countries including Pakistan has developed interest in bioinformatics research [1,3]. Most of the bioinformatics applications are database dependant and to access, search and retrieve data, reliable internet connections are required. To facilitate maximum utilization, bioinformatics resources and facilities around the globe prefer to download these databases on their local servers. There has been an exponential growth in database records as a consequence of major advances in genomics and proteomics technologies, stressing need of frequent updates with the latest releases. Many developing countries face a major problem in regular update of databases due to lack of infrastructure, slow/unreliable internet connectivity and low bandwidth. It is expected that in future, databases size would outgrow existing rate of transfer at current bandwidth, thus it is imperative to develop efficient tools for obtaining automatic updates on a regular basis. To address such issues, a Bio-Mirror project was also launched which uses FTP mode for data transfer [4].
Updates are usually managed by client server approach (FTP or WWW) or P2P (Peer-to-Peer) file sharing applications. FTP has been a traditional method for file sharing and downloading from remote server and is very popular for downloading large files. However, it requires large network bandwidths and suffers from scalability bottleneck. As an alternative, P2P applications have become immensely popular for fast and efficient distribution of files in recent years. P2P architecture operates in a distributed autonomous system mode that does not rely on a specific server system. Torrent protocol working environment is based on peer-to-peer (P2P) technique in which every user is connected to each other with mesh technology. On the other hand, FTP protocol working environment is completely dependent on a single server which means it may create single point of failure. The performance of traditional FTP file sharing applications deteriorates rapidly as the number of clients increase while in P2P module, more peers means better performance. There are many P2P file sharing applications such as Kazza, Gnutella, Napster, BitTorrent to name a few. Among these applications, BitTorrent P2P file sharing system has been analyzed in many studies [5,6].
Considering the existing scenario and future difficulties, techniques supporting automatic synchronization of databases at low bandwidth are of utmost importance. In current study, efficiency of FTP and BitTorrent applications are compared in order to download large sized database (Gigabyte) and using them through local servers without delay and response time out message. With the help of btP2P protocol, the problems of updating the enormous data of biological databases and at the same time, avoiding the network connection issues have been addressed.

Methodology: Computational Resources
Two Ultra20M2 of Sun Microsystems based nodes with dual core processor were utilized for current study. These servers were selected as they are stable, reliable and provided maximum uptime [7].

Selection of Application for database downloads
Multiple applications for FTP and torrent protocols are available. Filezilla [9] and Bitcomet [10] were selected as representatives of FTP and BitTorrent procedures, respectively. These programs are among the best clients, having the ability to download data at the same interval of time. Filezilla is a single server based solution that does not support torrent file, becomes slower with increasing number of users and lacks resume facility after internet link failure. Bit Torrent on the other hand is a peer based solution that uses mesh technology and supports resume facility as well as both torrent and FTP files.

Performance Evaluation
Most of the bioinformatics databases are usually uploaded on FTP server. Downloading of database was performed using both applications and was monitored for the span of fifteen hours. In order to make sure that the load on the network should be same for both of the methods during the test period; the whole procedure was carried out on different machines with similar specifications, and same network. Both the btP2P and FTP performances were evaluated.

Discussion:
Databases are usually downloaded using client-server architecture like FTP. If server becomes overloaded, response time might increase. P2P file sharing protocols have gained popularity as an alternative procedure to FTP. In current communication, both the applications Filezilla and Bitcomet were compared as representatives of FTP and Bit Torrent (P2P) protocols. Our results indicate that BitTorrent protocol is more efficient in downloading large data (GB) in less time period Table 1 (see supplementary material). In first hour, downloading speed of Bitcomet was 87 KB with 234 Mb while downloading speed using Filezilla was 21 KB with 66 MB. In successive hours, torrent downloading speed kept on improving than that of FTP and by the end of fifteenth hour, torrent downloading speed was at least four times higher than the FTP downloading speed.
The speed comparison of Torrent and FTP protocols with respect to time is shown in (Figure 1A). The results show the slow speed of FTP as compared to the torrent speed. Figure 1B represents the downloaded data comparison between the two protocols in specified time limit. This further demonstrates that the torrent is more reliable as compare to FTP protocol. In recent years, a significant part of internet bandwidth is being used by P2P traffic. BitTorrent is a popular P2P application that aims to avoid bottleneck of FTP servers while delivering large and popular files [11]. An earlier communication has clearly shown the better performance of btP2P protocol than traditional FTP for automatically synchronizing large amounts of biological databases across the three countries of Asia-Pacific region [12]. However, they have compared FTP and P2P file sharing applications using Azureus as a Bit Torrent representative. For current study, Bitcomet was selected, which is a client written in C++. Bitcomet can run in windows environment and offers a preview download mode so that users can preview download content although the file has not been completely finished. It allows users to create their own torrents and can be used for HTTP/FTP download, a format usually used for most of the bioinformatics database download. The results obtained from our study demonstrate that BtP2P techniques can be applied to scale database servers and can outperform client-server based applications. With two available nodes, it is concurred that the performance using btP2P is better than that of FTP. The results of our study showed significant improvement in download performance using btP2P than conventional File Transfer Protocol (FTP). Our study has exhibited the reliability of btP2P in the transmission of continuously growing multi-gigabyte biological databases without failure. Furthermore, the download performance for btP2P can be further intensified by including more nodes from various parts within the country. This study suggests that the btP2P technology is highly appropriate for file sharing applications as this is effective, viable and self scalable.

Conclusion:
Based on above mentioned observations, it can be concluded that the Torrent protocol is almost four times faster than FTP protocol. Hence torrent protocol is recommended as a better tool for updating and synchronization of the biological data sets using low bandwidth. Results obtained from this study support the findings of Sangket et al. [12] who compared the downloading performance between FTP and btP2P on different subnets among developing countries. Most of the databases use FTP protocol and as Bitcomet client supports both FTP and torrent, it may offer a better choice. The download performance for btP2P can be improved further by including more nodes from other institutes and Research and Development (R&D) organizations. It is suggested that btP2P technology may be an appropriate application for file sharing, automatic synchronization and distribution of biological databases and software over low-bandwidth networks.