Evident gap between generative artificial intelligence as an academic editor compared to human editors in scientific publishing

https://doi.org/10.55214/25768484.v8i6.2189

Authors

  • Malik Sallam Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman 11942, Jordan, and Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman 11942, Jordan https://orcid.org/0000-0002-0165-9670
  • Kholoud Al-Mahzoum School of Medicine, The University of Jordan, Amman 11942, Jordan
  • Omar Marzoaq School of Dentistry, The University of Jordan, Amman 11942, Jordan
  • Mohammad Alfadhel School of Medicine, The University of Jordan, Amman 11942, Jordan
  • Amer Al-Ajmi School of Medicine, The University of Jordan, Amman 11942, Jordan
  • Mansour Al-Ajmi School of Medicine, The University of Jordan, Amman 11942, Jordan
  • Mohammad Al-Hajeri School of Medicine, The University of Jordan, Amman 11942, Jordan
  • Muna Barakat Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman 11931, Jordan.

The labyrinthine process of manuscript evaluation in scientific publishing often delays disseminating timely research results. Generative Artificial Intelligence (genAI) models could potentially enhance efficiency in academic publishing. However, it is crucial to scrutinize the reliability of genAI in simulating human editorial decisions. This study analyzed 34 manuscripts authored by the corresponding author, involving initial editorial decisions from six publishers across 28 journals. Two genAI models, ChatGPT-4o and Microsoft Copilot, assessed these manuscripts using tailored prompts. The correlation between genAI and actual human editorial decisions was evaluated using Kendall’s τb. The original decision-making speed and the quality of genAI outputs evaluated by the CLEAR tool were recorded. Editorial decision-making by genAI models was instantaneous, compared to the editors’ average of 21.6±31.1 days. Both models achieved high scores on the CLEAR tool, averaging 4.8±0.4 for ChatGPT-4o and 4.8±0.5 for Copilot. Despite these efficiencies, there was no significant correlation between the genAI and human decisions (τb=0.121, P=.487 for ChatGPT-4o; τb=0.197, P=.258 for Copilot), nor between the decisions of the two genAI models (τb=0.318, P=.068). This preliminary study indicated that genAI models can expedite the editorial process with high-quality outputs. However, genAI has not yet achieved the accuracy of human editors in decision-making.

Section

How to Cite

Sallam, M. ., Al-Mahzoum, K. ., Marzoaq, O. ., Alfadhel, M. ., Al-Ajmi, A. ., Al-Ajmi, M. ., Al-Hajeri, M. ., & Barakat, M. . (2024). Evident gap between generative artificial intelligence as an academic editor compared to human editors in scientific publishing. Edelweiss Applied Science and Technology, 8(6), 960–979. https://doi.org/10.55214/25768484.v8i6.2189

Downloads

Download data is not yet available.

Dimension Badge

Download

Downloads

Issue

Section

Articles

Published

2024-10-08