A biocurator is a professional scientist who curates, collects, annotates, and validates information that is disseminated by biological and model Organism Databases. The role of a biocurator encompasses quality control of primary biological research data intended for publication, extracting and organizing data from original scientific literature, and describing the data with standard annotation protocols and vocabularies that enable powerful queries and biological database inter-operability. Biocurators communicate with researchers to ensure the accuracy of curated information and to foster data exchanges with research laboratories.
In genome annotation for example, biocurators commonly employ--and take part in the creation and development of--shared biomedical ontologies: structured, controlled vocabularies that encompass many biological and medical knowledge domains, such as the Open Biomedical Ontologies found in the OBO Foundry. These domains include genomics and proteomics, anatomy, animal and plant development, biochemistry, metabolic pathways, taxonomic classification, and mutant phenotypes.
Biocurators enforce the consistent use of gene nomenclature guidelines and participate in the genetic nomenclature committees of various model organisms, often in collaboration with the HUGO Gene Nomenclature Committee (HGNC). They also enforce other nomenclature guidelines like those provided by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB), one example of which is the Enzyme Commission EC number.
The International Society for Biocuration (ISB) was founded in 2008; the non-profit organisation "promotes the field of biocuration and provides a forum for information exchange through meetings and workshops." International Biocuration Conferences have been held in Pacific Grove, CA (2005), San José, CA (2007), Berlin, Germany (2009), Tokyo, Japan (2010), Washington, D.C. (2012), Cambridge, UK (2013), Toronto, Canada (2014), Beijing, China (2015), Geneva, Switzerland (2016) and Stanford, CA (2017). The 11th International Biocuration Conference was held April 8-11, 2018, in Shanghai, China. The ISB offers the Biocuration Career Award to biocurators in the community: the Biocurator Career Award (given annually) and the ISB Award for Exceptional Contributions to Biocuration (given biannually).
There is some overlap between the work of biocurators and Wikipedia, with boundaries between scientific databases and popflock.com resource becoming increasingly blurred. Databases like Rfam and the Protein Data Bank for example make heavy use of popflock.com resource and its editors to curate information. However, most databases offer highly structured data that is searchable in complex combinations, which is usually not possible on Wikipedia, although Wikidata aims at solving this problem to some extent.
There has been also recent interest in exploring the use of natural-language processing and text mining technologies to enable a more systematic extraction of candidate information for manual literature curation. Therefore, the definition of the main literature curation stages of a 'canonical' biocuration workflow has been examined. The use of text mining techniques for these various stages, from the initial detection of curation-relevant articles (triage) to the extraction of annotations and entity relationships has been attempted by various specialized systems.
Traditionally, biological knowledge has been aggregated through expert curation, conducted manually by dedicated experts. However, with the burgeoning volume of biological data and increasingly diverse densely informative published literatures, expert curation becomes more and more laborious and time-consuming, increasingly lagging behind knowledge creation.
Community Curation harnesses community intelligence in knowledge curation, bears great promise in dealing with the flood of biological knowledge. To exploit the full potential of the scientific community for knowledge curation, multiple biological wikis (bio-wikis) have been built to date.
To increase community curation in bio-wikis, AuthorReward, an extension to MediaWiki, is developed for rewarding community-curated efforts in knowledge curation. AuthorReward provides bio-wikis with an authorship metric; it quantifies researchers' contributions by properly factoring both edit quantity and quality and yields automated explicit authorship according to their quantitative contributions.
RiceWiki, a wiki-based database for community curation of rice genes, is a living demo equipped with AuthorReward, available at http://ricewiki.big.ac.cn/index.php/Os01g0883800.
Another community based approach to analyze biological data is called Systems Biology Verification (SBV) IMPROVER. Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. SBV IMPROVER presents a crowd-verification approach for the visualization and expansion of biological networks.
AuthorReward is freely available at http://cbb.big.ac.cn/software.