SyBIT - the Bioinformatics Project from SystemsX.ch
The Swiss Systems Biology Initiative SystemsX.ch has established SyBIT as the central Bioinformatics and IT support service for all its funded research projects. SyBIT supports researchers in the following ways:
- Provide the necessary local and central infrastructure to the SystemsX.ch projects to store and analyze their data.
- Make the necessary bioinformatics software and tools available for data analysis, data management and data mining.
- Software engineering and refactoring of existing science code, making it reusable and maintainable for the community.
- Support the projects with collaboration services: websites, mailinglists, wikis, code repositories and other tools like macros and software packages.
Motivation for SyBIT
Coordinated Resource Usage
SystemsX.ch provides significant funding for Systems Biology research in Switzerland. As part of this funding, new instruments are acquired or built, which produce a very large amount of raw observational digital data. Most participating institutions and research groups need also some funding to prepare to manage the expected data volumes. Since all SystemsX.ch projects are collaborations between Swiss Universities, it does not make sense that each project and institution deals with this task in isolation and with individual funds. SyBIT has been created to help and coordinate the efforts to acquire, manage and analyze and ultimately publish systems biology data and the corresponding scientific analysis software, and to invest the necessary effort and funds in a sustainable manner.
Digital Challenges in Life Sciences
Senior researchers, Postdocs and PhD students have build as much infrastructure and software as necessary for their research locally, but sustainable maintenance, documentation, support – and therefore reuse – of their tools and workflows is not in their focus. Too often tools are re-invented because the person in charge either does not know that a suitable tool exists already, or the existing tools are no longer maintained and documented well enough for others to understand and trust.
While developing new algorithms and new analysis methods is a research activity, integrating the algorithms into well-maintained libraries, building robust analysis workflows and maintaining and documenting them is not a genuine research task.
This was not a problem in the past as in fact it was seen as part of the education of a scientist to write certain algorithms from scratch. But the complexity of the tools has grown to a level where it has become too time-consuming and error-prone for researchers to create new implementations of old algorithms.
Even more problems arise from the continuing growth and diversity of life science research data. Data management tasks, like data lifecycle management, data annotation, formatting, organization, archiving are not consciously done, data management simply ‘happens’ depending on the organizational talent of the individual researcher. Finding data from several years ago or remembering which analysis results belong to what raw dataset can be anything from easy to impossible. But in any case, individual data management is not sustainable – when a person leaves the project, group or institution, chances are that nobody will be able to make use of his or her data – and individual data management does not scale to the data volumes that modern research is dealing with today.
So there are two fundamental challenges that researchers are faced with in today’s computer-enabled science in life science:
- Dealing with the ever increasing complexity of research computation.
- Dealing with the rapid growth of research data.
SystemsX.ch has suddenly brought these issues to focus, not only in terms of data volume and complexity but also data sharing and collaboration between institutions.