Software

Extended Local Similarity Analysis

Finding Time-Dependent Associations in Time Series Datasets

INTRODUCTION

In recent years, advances in molecular technologies have empowered researchers with the ability to spatially and temporally characterize natural microbial communities without lab cultivation (Fuhrman, 2009). Mining and analyzing co-occurrence patterns in these new datasets are fundamental to revealing existing symbiosis relations and microbe-environment interactions (Chaffron et al., 2010; Steele et al., 2011). Time series data, in particular, are receiving more and more attention, since not only undirected but also directed associations can be inferred from these datasets.

Researchers typically use techniques like principal component analysis (PCA), multidimensional scaling (MDS), discriminant function analysis (DFA) and canonical correlation analysis (CCA) to analyze microbial community data under various conditions. Different from these methods, the Extended Local Similarity Analysis (ELSA) technique is unique to capture the time-dependent associations (possibly time-shifted) between microbes and between microbe and environmental factors (Ruan et al., 2006). Significant ELSA associations can be interpreted as a partially directed association network for further network-based analysis.

Studies adopting the ELSA technique have shown interesting and novel discoveries for microbial communities (Paver et al., 2010; Shade et al., 2010; Beman et al., 2011; Steele et al., 2011). However current dataset scale has outdated the old script. To improve computation efficiency, incorporate new features, such as time series data with replicates, and make the analysis technique more accessible to users, we have re-implemented the ELSA algorithm as a C++ extension to Python. We also integrated the new ELSA tool set with the popular Galaxy framework (Goecks et al., 2010) for web based pipeline analysis.

IMPLEMENTATION

Figure 1.  The analysis workflow of Extended Local Similarity Analysis (ELSA) tools. Users start with raw data (matrices of time series) as input and specify their requirements as parameters. The ELSA tools subsequently F-transform and normalize the raw data and then calculate the Local Similarity (LS) Scores and the Pearson’s Correlation Coefficients. The tools then assess the statistical significance (P-values) of these correlation statistics using permutation test and filter out insignificant results. Finally, the tools construct a partially directed association network from significant associations.

AVAILABILITY

  • Download released standalone source code package here and install. Look into the README.txt file within the package (also viewable here) for detailed installation information and others.
  • Source code access at: bitbucket.org/charade/elsa
  • The python package is made open source for advanced users to pipeline the analysis or implement other variants.

WIKI

eLSA’s Wiki pages have manuals, FAQs and other information that you MUST read before actually using the eLSA tool. They are openly editable. You are more than welcome to contribute to this ongoing documentation.

NOTES

  1. A historical R version is available through Prof. Fengzhu Sun’s page and is not supported any longer.
  2. In case the integrated q-value does not work for you, there are many other independent false discovery rate calculation packages, such as locfdr, mixfdr, fuzzyFDR, pi0, fdrci, nFDR

CONTACTS

Questions and comments shall be addressed to lixia@stanford.edu

CITATIONS

Please cite the references 1 and 2 if the eLSA python package is used in your study. Please cite the reference 3 if you used the old R script.

  1. Li C. Xia, Dongmei Ai, Jacob Cram, Jed A. Fuhrman, Fengzhu Sun Efficient Statistical Significance Approximation for Local Association Analysis of High-Throughput Time Series Data Bioinformatics 2013, 29(2):230-237
  2. Li C Xia, Joshua A Steele, Jacob A Cram, Zoe G Cardon, Sheri L Simmons, Joseph J Vallino, Jed A Fuhrman and Fengzhu Sun Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates BMC Systems Biology 2011, 5(Suppl 2):S15
  3. Quansong Ruan, Debojyoti Dutta, Michael S. Schwalbach, Joshua A. Steele, Jed A. Fuhrman and Fengzhu Sun Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors Bioinformatics 2006, 22(20):2532-2538