Mapping the Research Landscape of Agricultural Sciences
PDF (Русский)
PDF

Keywords

publication activity
text mining
science mapping
research landscape
agricultural science
scientometrics
young researchers
Russian Science Index

How to Cite

DevyatkinD., NechaevaE., SuvorovR., & TikhomirovI. (2018). Mapping the Research Landscape of Agricultural Sciences. Foresight and STI Governance, 12(1), 57-66. https://doi.org/10.17323/2500-2597.2018.1.69.78

Abstract

A research landscape is a high-level description of the current state of a certain scientific field and its dynamics. High-quality research landscapes are important tools that allow for more effective research management. This paper presents a novel framework for the mapping of research. It relies on full-text mining and topic modeling to pool data from many sources without relying on any specific taxonomy of scientific fields and areas. The framework is especially useful for scientific fields that are poorly represented in scientometric databases, i.e., Scopus or Web of Science. The high-level algorithm consists of (1) full-text collection from reliable sources; (2) the automatic extraction of research fields using topic modeling; (3) semi-automatic linking to scientometric databases; and (4) a statistical analysis of metrics for the extracted scientific areas. Full-text mining is crucial due to (a) the poor representation of many Russian research areas in systems like Scopus or Web of Science; (b) the poor quality of Russian Science Index data; and (c) the differences between taxonomies used in different data sources. Major advantages of the proposed framework include its data-driven approach, its independence from scientific subjects’ taxonomies, and its ability to integrate data from multiple heterogeneous data sources. Furthermore, this framework complements traditional approaches to research mapping using scientometric software like Scopus or Web of Science rather than replacing them. We experimentally evaluated the framework using agricultural science as an example, but the framework is not limited to any particular domain. As a result, we created the first research landscape covering young researchers in agricultural science. Topic modeling yielded six major scientific areas within the field of agriculture. We found that statistically significant differences between these areas exist. This means that a differentiated approach to research management is critical. Further research on this subject includes the application of the framework to other scientific fields and the integration of other collections of research and technical documentation (especially patents).

https://doi.org/10.17323/2500-2597.2018.1.69.78
PDF (Русский)
PDF

References

Blei D.M., Ng A.Y., Jordan M.I. (2003) Latent Dirichlet Allocation // Journal of Machine Learning Research. Vol. 3. P. 993-1022.

Garousi V., Mäntylä M.V. (2016) Citations, research topics, and active countries in software engineering: A bibliometrics study // Computer Science Review. Vol. 19. P. 56-77.

Hintze J.L., Nelson R.D. (1998) Violin plots: A box plot-density trace synergism // The American Statistician. Vol. 52. № 2. Р. 181-184.

Hofmann T. (1999) Probabilistic latent semantic analysis // Proceedings of the XV Сonference on Uncertainty in Artificial Intelligence. Berlington, MA: Morgan Kaufmann Publishers, Inc. P. 289-296.

Hunter J.D. (2007) Matplotlib: A 2D graphics environment // Computing in Science and Engineering. Vol. 9. № 3. Р. 90-95.

Mann H.B., Whitney D.R. (1947) On a test of whether one of two random variables is stochastically larger than the other // The Annals of Mathematical Statistics. Vol. 18. № 1. P. 50-60.

Manning C.D., Prabhakar R., Schütze H. (2008) Introduction to information retrieval. Cambridge: Cambridge University Press.

Nivre J., Boguslavsky I.M., Iomdin L.L. (2008) Parsing the SynTagRus treebank of Russian // Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008). Stroudsburg, PA: Association for Computational Linguistics (ACL). P. 641-648.

Nivre J.H., Hall J., Nilsson J., Chanev A., Eryiğit G., Kübler S., Marinov S., Marsi E. (2007) MaltParser: A language-independent system for data-driven dependency parsing // Natural Language Engineering. Vol. 13. № 2. P. 95-135.

Oldham P., Hall S., Burton G. (2012) Synthetic biology: Mapping the scientific landscape // PLoS One. Vol. 7. № 4. P. e34368. Режим доступа: https://www.ncbi.nlm.nih.gov/pubmed/22539946, дата обращения 24.05.2017.

Osipov G., Smirnov I., Tikhomirov I., Shelmanov A. (2013) Relational-situational method for intelligent search and analysis of scientific publications // Proceedings of the Integrating IR Technologies for Professional Search Workshop, Moscow, Russian Federation, March 24, 2013 / Eds. M. Lupu, M. Salampasis, N. Fuhr, A. Hanbury, B. Larsen, H. Strindberg. P. 57-64. Режим доступа: http://ceur-ws.org/Vol-968/irps_10.pdf, дата обращения 24.05.2017.

Padró L., Stanilovsky E. (2012) Freeling 3.0: Towards wider multilinguality // Proceedings of the International Conference on Language Resources and Evaluation "LREC2012", Istanbul, 2012. P. 2473-2479. Режим доступа: http://www.lrec-conf.org/proceedings/lrec2012/pdf/430_Paper.pdf, дата обращения 24.05.2017.

Rehurek R., Sojka P. (2010) Software framework for topic modelling with large corpora // Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, May 22. P. 51-56. Режим доступа: http://www.lrec-conf.org/proceedings/lrec2010/workshops/W10.pdf, дата обращения 24.05.2017.

Scott D.W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. New York; Chicester: John Wiley & Sons.

Shelmanov A.O., Smirnov I.V. (2014) Methods for semantic role labeling of Russian texts // Computational Linguistics and Intellectual Technologies. Proceedings of International Conference Dialog. Vol. 13. № 20. Р. 607-620.

Shvets A., Devyatkin D., Sochenkov I., Tikhomirov I., Popov K., Yarygin K. (2015) Detection of Current Research Directions Based on Full-Text Clustering // Proceedings of 2015 Science and Information Conference (SAI 2015), July 28-30, 2015, London, United Kingdom. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). P. 483-488.

Sokirko A. (2001) A short description of Dialing Project. Режим доступа: http://www.aot.ru/docs/sokirko/sokirko-candid-eng.html, дата обращения 15.05.2017.

Suvorov R.E., Sochenkov I.V. (2015) Establishing the similarity of scientific and technical documents based on thematic significance // Scientific and Technical Information Processing. Vol. 42. P. 321-327.

Zubarev D., Sochenkov I. (2014) Using sentence similarity measure for plagiarism source retrieval // Working Notes for CLEF 2014 Conference. P. 1027-1034. Режим доступа: https://pdfs.semanticscholar.org/4556/08d685695c1a7f05ffd8257fae79e1f64593.pdf, дата обращения 15.05.2017.

Еременко Г. (2014) Во всем виноват РИНЦ? // Троицкий вариант. № 163. С. 7.

Зибарева И.В., Солошенко Н.С. (2015) Тематическая структура российского сегмента научных журналов в глобальных и национальных информационных ресурсах // Материалы Третьей международной конференции НЭИКОН «Электронные научные и образовательные ресурсы: создание, продвижение и использование». М.: НП НЭИКОН. С. 255-259.

Кристофилопулос Э., Манцанакис C. (2016) Китай-2025: научный и инновационный ландшафт // Форсайт. Т. 10. № 3. С. 7-16.

Сидорова В.В. (2016) Использование РИНЦ для оценки научной деятельности гуманитариев // Сибирские исторические исследования. № 1. С. 27-39.

Смирнов И.В., Соченков И.В., Муравьев В.В., Тихомиров И.А. (2008) Результаты и перспективы поискового алгоритма Exactus // Труды российского семинара по оценке методов информационного поиска (РОМИП) 2007-2008. СПб: НУ ЦСИ. С. 66-76.

Фрадков А. (2015) РИНЦ продолжает врать // ТрВ-Наука. Режим доступа: http://trv-science.ru/2015/09/08/risc-prodolzhaet-vrat/, дата обращения 15.05.2017.

Экономов И. (2017) Мусорная наука // ТрВ-Наука. Режим доступа: http://trv-science.ru/2017/04/25/musornaya-nauka/, дата обращения 15.05.2017.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.