The Evaluation of GenAI Capabilities to Implement Professional Tasks

КузьминовЯ., & КручинскаяЕ. (2024). Потенциал генеративного искусственного интеллекта для решения профессиональных задач. Форсайт, 18(4), 67-76. https://doi.org/10.17323/2500-2597.2024.4.67.76

Аннотация

Востребованность генеративного искусственного интеллекта (GenAI) стремительно растет ввиду способности быстро обрабатывать масштабные объемы данных, компилировать их и транслировать «общее мнение». Однако дисбаланс между «компетенциями» GenAI препятствует расширению использования этого инструмента для решения сложных профессиональных задач. ИИ работает как гигантский накопитель и средство воспроизводства знаний, однако не способен их интерпретировать и находить правильное применение в зависимости от контекста. Сохраняется критическая вероятность ошибки при генерации ответов даже на самые простые вопросы.
В статье оценивается степень значимости ограничений, присущих GenAI. Тестирование лежащих в его основе языковых моделей, включая новейшие версии — GPT-4o1 и GigaChat MAX, проводилось с помощью авторского набора вопросов, основанного на таксономии Блума. Установлено, что вероятность получения правильного ответа практически не зависит от количества параметров настройки, сложности и таксономии, а при наличии множественного выбора — снижается. Полученные результаты подтверждают предположение о невозможности применения современных инструментов ИИ в профессиональных целях. Предлагаются опции, способные внести значимый вклад в достижение как минимум квазипрофессионального уровня.

https://doi.org/10.17323/2500-2597.2024.4.67.76

PDF Рус

PDF (English)

Литература

НИУ ВШЭ (2024) Подготовка высококвалифицированных кадров в области искусственного интеллекта (под науч. ред. Л.М. Гохберга), М.: НИУ ВШЭ.

Alimardani A. (2024) Generative artificial intelligence vs. law students: An empirical study on criminal law exam performance. Law, Innovation and Technology, 2392932, 1-43. DOI: https://doi.org/10.1080/17579961.2024.2392932

Al-Zahrani A., Alasmari T. (2024) Exploring the impact of artificial intelligence on higher education: The dynamics of ethical, social, and educational implications. Humanities and Social Sciences Communications, 11(1), 912. DOI: https://doi.org/10.1057/s41599-024-03432-4

Al-Zahrani A.M. (2024) From Traditionalism to Algorithms: Embracing Artificial Intelligence for Effective University Teaching and Learning. IgMin Research, 2(2), 102-112. DOI: https://doi.org/10.61927/igmin151

Anthis J., Lum K., Ekstrand M., Feller A., D'Amour A., Tan C. (2024) The impossibility of fair LLMs (ArXiv paper 2406.03198). DOI: https://doi.org/10.48550/arXiv.2406.03198

Antoniak S., Krutul M., Pióro M., Krajewski J., Ludziejewski J., Ciebiera K., Król K., Odrzygóźdź T., Cygan M., Jaszczur S. (2023) Mixture of Tokens: Continuous MoE through Cross-Example Aggregation (ArXiv paper 2310.15961). DOI: https://doi.org/10.48550/arXiv.2310.15961

Bloom B.S., Engelhart M.D., Furst E.J., Hill W.H., Krathwohl D.R. (1956) Taxonomy of Educational Objectives: The Classification of Educational Goals (Handbook 1: Cognitive Domain), Ann Arbor, MI: Edwards Bros.

Borji A. (2023) A categorical archive of Chat GPT failures (ArXiv paper 2302.03494). DOI: https://doi.org/10.48550/arXiv.2302.03494

Cai W., Jiang J., Wang F., Tang J., Kim S., Huang J. (2024) A Survey on Mixture of Experts (ArXiv paper 2407.06204). DOI: https://doi.org/10.48550/arXiv.2407.06204

Chen Y., Esmaeilzadeh P. (2024) Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges. Journal of Medical Internet Research, 26, e53008. DOI: https://doi.org/10.2196/53008

Cheung M. (2024) A Reality check of the benefits of LLM in business (ArXiv paper 2406.10249). DOI: https://doi.org/10.48550/arXiv.2406.10249

Choi J., Palumbo N., Chalasani P., Engelhard M.M., Jha S., Kumar A., Page D. (2024) MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance (ArXiv paper 2408.01869). DOI: https://doi.org/10.48550/arXiv.2408.01869

Chu H.C., Hwang G.H., Tu Y.F., Yang K.H. (2022) Roles and research trends of artificial intelligence in higher education: A systematic review of the top 50 most-cited articles. Australasian Journal of Educational Technology, 38(3), 22-42.

Dai C-P, Ke F. (2022) Educational applications of artificial intelligence in simulation-based learning: A systematic mapping review. Computers and Education: Artificial Intelligence, 3, 100087. DOI: https://doi.org/10.1016/j.caeai.2022.100087

Gill S.S., Xu M., Patros P., Wu H., Kaur R., Kaur K., Fuller S., Singh M., Arora P., Kumar A.P., Stankovski V., Abraham A., Ghosh S.K., Lutfiyya H., Kanhere S.S., Bahsoon R., Rana O., Dustdar S., Sakellariou R., Uhlig S., Buyya R. (2023) Transformative Effects of ChatGPT on Modern Education: Emerging Era of AI Chatbots. Internet of Things and Cyber-Physical Systems, 4, 19-23. DOI: https://doi.org/10.1016/j.iotcps.2023.06.002

Han S.J., Ransom K.J., Perfors A., Kemp C. (2023) Inductive reasoning in humans and large language models. Cognitive Systems Research, 83, 1-28. DOI: https://doi.org/10.1016/j.cogsys.2023.101155

Hassan R., Ali A., Howe C.W., Zin A.M. (2022) Constructive alignment by implementing design thinking approach in artificial intelligence course: Learners' experience. AIP Conference Proceedings, 2433(1), 0072986. DOI: https://doi.org/10.1063/5.0072986

Hendrycks D., Burns C., Basart S., Zou A., Mazeika M., Song D., Steinhardt J. (2020) Measuring Massive Multitask Language Understanding (ArXiv paper 2009.03300). DOI: https://doi.org/10.48550/arXiv.2009.03300

IDC (2024) The Global Impact of Artificial Intelligence on the Economy and Jobs, Needham, MA: IDC Corporate.

Jin B., Liu G., Han C., Jiang M., Ji H., Han J. (2023) Large Language Models on Graphs: A Comprehensive Survey (ArXiv paper 2312.02783). DOI: https://doi.org/10.48550/arXiv.2312.02783

Kardanova E., Ivanova A., Tarasova K., Pashchenko T., Tikhoniuk A., Yusupova E., Kasprzhak A.G., Kuzminov Y., Kruchinskaia E., Brun I. (2024) A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models (arXiv paper 2411.00045). DOI: https://doi.org/10.48550/arXiv.2411.00045

Kuhn T.S. (1977) The Essential Tension, Chicago: University of Chicago Press.

Lai J., Gan W., Wu J., Qi Z., Yu P.S. (2023) Large Language Models in Law: A Survey (ArXiv paper 2312.03718). DOI: https://doi.org/10.48550/arXiv.2312.03718

Lakatos I. (1963) Proofs and Refutations (I). British Journal for the Philosophy of Science, 14(53), 1-25.

Lakatos I. (1970a) Falsification and the Methodology of Scientific Research Programmes. In: Criticism and the Growth of Knowledge (eds. I. Lakatos, A. Musgrave), Aberdeen: Cambridge University Press, pp. 91-195.

Lakatos I. (1970b) History of Science and Its Rational Reconstructions. PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, pp. 91-136.

Liang L., Sun M., Gui Z. et al. (2024) KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation. ArXiv paper 2409.13731, 1-33. DOI: https://doi.org/10.48550/arXiv.2409.13731

Liu N.F., Lin K., Hewitt J., Paranjape A., Bevilacqua M., Petroni F., Liang P. (2023) Lost in the Middle: How language models use long contexts (ArXiv paper 2307.03172). DOI: https://doi.org/10.48550/arXiv.2307.03172

Luo L., Li Y.F., Haffari G., Pan S. (2023) Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning (ArXiv paper 2310.01061). DOI: https://doi.org/10.48550/arXiv.2310.01061

McKnight M.A., Gilstrap C.M., Gilstrap C.A., Bacic D., Shemroske K., Srivastava S. (2024) Generative Artificial Intelligence in Applied Business Contexts: A systematic review, lexical analysis, and research framework. Journal of Applied Business and Economics, 26(2), 7040. DOI: https://doi.org/10.33423/jabe.v26i2.7040

Mirzadeh I., Alizadeh K., Shahrokhi H., Tuzel O., Bengio S., Farajtabar M. (2024) GSM-Symbolic: Understanding the limitations of mathematical reasoning in large language models (ArXiv paper 2410.05229). DOI: https://doi.org/10.48550/arXiv.2410.05229

Mortlock R., Lucas C. (2024) Generative artificial intelligence (Gen-AI) in pharmacy education: Utilization and implications for academic integrity: A scoping review. Exploratory Research in Clinical and Social Pharmacy, 15, 100481. DOI: https://doi.org/10.1016/j.rcsop.2024.100481

Naveed H., Khan A.U., Qiu S., Saqib M., Anwar S., Usman M., Akhtar N., Barnes N., Mian A. (2023) A comprehensive overview of large language models (ArXiv paper 2307.06435). DOI: https://doi.org/10.48550/arXiv.2307.06435

Nguyen H., Fungwacharakorn W., Satoh K. (2023) Enhancing logical reasoning in large language models to facilitate legal applications (ArXiv paper 2311.13095). DOI: https://doi.org/10.48550/arXiv.2311.13095

Noever D., Ciolino M. (2023) Professional Certification Benchmark Dataset: The first 500 jobs for large language models (ArXiv 2305.05377). DOI: https://doi.org/10.48550/arXiv.2305.05377

OECD (2024) OECD Economic Outlook (Interim Report, September 2024), Paris: OECD.

Ogunleye B., Zakariyyah K.I., Ajao O., Olayinka O., Sharma H. (2024) A Systematic Review of Generative AI for Teaching and Learning practice. Education Sciences, 14(6), 14060636. DOI: https://doi.org/10.3390/educsci14060636

ORR (2023) Rail industry finance (UK): April 2022 to March 2023, London: Office of Rail and Road.

Rasal S., Hauer E.J. (2024) Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs (ArXiv paper 2402.16713). DOI: https://doi.org/10.48550/arXiv.2402.16713

Sanmartin D. (2024) KG-RAG: Bridging the gap between knowledge and creativity (ArXiv paper 2405.12035). DOI: https://doi.org/10.48550/arXiv.2405.12035

Shapira E., Madmon O., Reichart R., Tennenholtz M. (2024) Can LLMs replace economic choice prediction labs? The case of language-based persuasion games (ArXiv paper 2401.17435). DOI: https://doi.org/10.48550/arXiv.2401.17435

Sohail S.S., Faiza Farhat F., Himeur Y., Nadeem M., Madsen D.O., Singh Y., Atalla S., Mansoor W.. (2023) Decoding ChatGPT: A taxonomy of existing research, current challenges, and possible future directions. Journal of King Saud University - Computer and Information Sciences, 35(8). DOI: https://doi.org/10.1016/j.jksuci.2023.101675

Strachan J., Albergo D., Borghini G., Pansardi O., Scaliti E., Gupta S., Saxena K., Rufo A., Panzeri S., Manzi G., Graziano M.S.A., Becchiol C. (2024) Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295. DOI: https://doi.org/10.1038/s41562-024-01882-z

Sun J., Xu C., Tang L., Wang S., Lin C., Gong Y., Ni L.M., Shum H.Y., Guo J. (2023) Think-on-Graph: Deep and responsible reasoning of large language model on knowledge graph (ArXiv paper 2307.07697). DOI: https://doi.org/10.48550/arXiv.2307.07697

Thomson Reuters (2024) 2024 Generative AI in Professional Services, Toronto: Thomson Reuters Institute.

Turnock D. (1998) An Historical Geography of Railways in Great Britain and Ireland (1st ed), New York: Routledge.

Wan Y., Wang W., Yang Y., Yuan Y., Huang J., He P., Jiao W., Lyu M.R. (2024) A ∧ B ⇔ B ∧ A: Triggering logical reasoning failures in large language models (ArXiv paper 2401.00757). DOI: https://doi.org/10.48550/arXiv.2401.00757

Wang Y., Ma X., Zhang G., Ni Y., Chandra A., Guo S., Ren W., Arulraj A., He X., Jiang Z., Li T., Ku M., Wang K., Zhuang A., Fan R., Yue X., Chen W. (2024) MMLU-Pro: A more robust and challenging Multi-Task Language Understanding benchmark (ArXiv paper 2406.01574). DOI: https://doi.org/10.48550/arXiv.2406.01574

Wei J., Wang X., Schuurmans D., Bosma M., Ichter B., Xia F., Ed H., Quoc C.V., Zhou L.D. (2022) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (ArXiv paper 2201.11903). DOI: https://doi.org/10.48550/arXiv.2201.11903

Wen Y., Wang Z., Sun J. (2023) MindMap: Knowledge Graph prompting sparks graph of thoughts in large language models (ArXiv paper 2308.09729). DOI: https://doi.org/10.48550/arXiv.2308.09729

Xu Z., Cruz M.J., Guevara M., Wang T., Deshpande M., Wang X., Li Z. (2024) Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering (ArXiv paper 2404.17723). DOI: https://doi.org/10.48550/arXiv.2404.17723

Yang L., Chen H., Li Z., Ding X., Wu X. (2023) Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling (ArXiv paper 2306.11489). DOI: https://doi.org/10.48550/arXiv.2306.11489

Zhang Y., Ding H., Shui Z., Ma Y., Zou J., Deoras A., Wang H. (2021) Language models as recommender systems: Evaluations and limitations. Paper presented at the NeurIPS 2021 Workshop on I (Still) Can't Believe It's Not Better.

Zhang Y., Sun R., Chen Y., Pfister T., Zhang R., Arik S.O. (2024) Chain of Agents: Large language models collaborating on Long-Context Tasks (ArXiv paper 2406.02818). DOI: https://doi.org/10.48550/arXiv.2406.02818

Zhong Z., Xia M., Chen S., Lewis M. (2024) Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training (ArXiv paper 2405.03133). DOI: https://doi.org/10.48550/arXiv.2405.03133

Zhou J.P., Luo K.Z., Gu J., Yuan J., Weinberger K.Q., Sun W. (2024) Orchestrating LLMs with Different Personalizations (ArXiv paper 2407.04181). DOI: https://doi.org/10.48550/arXiv.2407.04181

Zhu Y., Wang X., Chen J., Qiao S., Ou Y., Yao Y., Deng S., Chen H., Zhang N. (2023) LLMS for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities (ArXiv paper 2305.13168). DOI: https://doi.org/10.48550/arXiv.2305.13168

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.

Скачивания

Данные скачивания пока не доступны.

Потенциал генеративного искусственного интеллекта для решения профессиональных задач

Ключевые слова

Как цитировать

Скачать ссылку

Аннотация

Литература

Скачивания