Introduction to Bioinformatics



Bioinformatics is an interdisciplinary field mainly involving molecular biology, genetics, computer science, mathematics, and statistics. It is currently defined as the study of information content and information flow in biological systems and processes. It has evolved to serve as the bridge between observations (data) in diverse biologically-related disciplines and the derivations of understanding (information) about how the systems or processes function, and subsequently the application (knowledge). The terms “bioinformatics” and “computational biology” are only about 20 years old – their first appearance in the Medline database may have been as the keywords for the 1990 article describing the first steps of the National Center for Biotechnology Information (Benson et al., 1990).

Rise of Bioinformatics

Studying and analysing each organism is an extremely complex task. Genes, proteins, cell tissues, bacterial communities and their interactions are some of the many focuses of studies in biology. Reproducing and analysing all of them in the wet lab is a close to impossible goal. However, today we have a range of high throughput analysis tools allowing us to acquire large amounts of data format, which can be stored on servers or hard drives. Therefore, those data intensive, large-scale biological problems can be addressed from a computational point of view. The best known example of big data, is the sequencing of whole genomes: their sequence can be read, stored and processed either directly or later. Nowadays, computers are a much more accessible tool, faster and with more computational power in comparison to a few years ago, which justifies the rise of bioinformatics the latest years. In addition the entire IT infrastructure developed drastically enabling us not just to store large amount of data but also to process them. Some of the applications for bioinformatics are molecular medicine, personalised medicine, gene therapy, drug development or microbial genome applications.

Areas and Applications of Bioinformatics

There is a wide range of areas in bioinformatics that one can specialise in. Some revolve around the genome of a specific organism; for example, genome assembly where the whole sequence is being built from smaller fragments, making a further analysis of the organism possible. Such analysis could be the genome annotation, the process of identifying and locating genes and determining their function. On the other hand, the focus does not have to be a specific organism, but a whole community of organisms. This can be done by analysing multiple genomes together and is known as meta-analysis. This approach can be applied at genomic, as well as transcriptomic and proteomic level. The study of the interactions at all levels in a biological community which gives an understanding of the function and behaviour of the system as a whole, is called Systems Biology.

Data Science

So what exactly does a bioinformatician work with? The answer is with large text files that contain sequences, namely Fasta files, which he/ she processes and finds the relevant information for his/ her current analysis. The storage of such large files and their accessibility to the whole scientific community is being dealt with the use of public databases. Genome sequences, annotation and other information can be found, downloaded or used by tools for mapping and aligning sequences. Data tables are another file format a bioinformatician deals on a daily basis where statistics, modelling, machine learning and pattern is applied. Summarising the points it is evident that bioinformatics is indeed a data scientist which just deals with biology.

interaction network

Computational Tools and Programming Languages

For someone to be able to perform such analysis, he is required to be familiar with specific certain programming languages and of course the Unix system as it is the most common system used in bioinformatics because of its simplicity and ability to manipulate files with low memory usage. As for the programming languages the choice will be done each time depending on the objective. Python, can easily deal with text handling and manipulation and most tools are written in it. On the other hand R specialises in data tables, conducting statistics and data visualisation. Other programming languages are used to a lesser extent depending again on the goal to be achieved such as Pearl, Java and C++. One extremely important aspect of bioinformatics is that it is a community driven science. Most of the tools are open source and available for everyone to use. Moreover, due to the high activity of this community, a huge amount of help is available online through question and answer websites, which contain information for any level.

Future – Career Prospects

Bioinformatics is considered one of the top working fields at the moment, as technology progresses and data are generated in great quantities and need to be analysed with computational tools. Furthermore, the amount of bioinformatic courses on offer has greatly been increased in the past years not only because of the greater computational power that is available, but also from the fact that the results are promising and in line with reality. Moreover, the future for this field is rosy as tools and methods are in demand and updated constantly in order to achieve the better and the more accurate result. Lastly, since it is considered an interdisciplinary science, its wide range of knowledge (Biology, Data Science, Computer Science) can be applied many different fields and give different career opportunities.

Closing Remarks

Bioinformatics has proven to possess great potential to identify diseases, determine treatment and help make human lives better. With the inspiration and knowledge of computer science, fields such as gene technology, medicine and healthcare can evolve from curing individual patients to healing entire populations.


Follow us