A recent study published in Science offers one of the most comprehensive looks at how large language models (LLMs) are reshaping the way science is produced, communicated, and evaluated. Conducted by researchers from Cornell University and the University of California Berkeley, the analysis draws on massive datasets—including 2.1 million preprints, 28,000 peer-review reports, and 246 million document accesses—to examine LLM adoption across disciplines.

 

Productivity is surging

One of the study’s most striking findings is that scientists who adopt LLM tools publish significantly more papers. Across three major repositories—arXiv, bioRxiv, and SSRN—researchers using LLMs increased their output by roughly 36% to nearly 60%, depending on the field. These gains likely derive from LLM assistance with time-intensive tasks such as drafting text, summarizing literature, generating code, or organizing data. Notably, the productivity boost is not evenly distributed. Scholars who are likely non-native English speakers—especially those affiliated with Asian institutions—experience the largest increases, in some cases approaching 90%.

This suggests LLMs may reduce longstanding linguistic barriers in academia, helping level the playing field for researchers working outside English-dominant environments.

 

The discovery of knowledge is broadening

Researchers using LLM-assisted search access more diverse sources, including books, newer papers, and less-cited work. By adopting LLMs, authors are about 11.9% more likely to cite books and tend to reference more recent and less prominent studies.

This indicates that AI tools may help scientists navigate the ever-expanding research landscape more efficiently and uncover overlooked insights.

 

Polished writing is no longer a sign of quality

Traditionally, sophisticated academic writing has been associated with stronger research and higher publication success for several reasons: 1) producing clear, precise, technically sophisticated prose required deep understanding of the topic, time investment, and familiarity with disciplinary conventions; 2) in science, clarity isn’t cosmetic, it’s functional, and good writing helps explain methods reproducibly, specify assumptions, and distinguish results from interpretation; 3) reviewing is time-pressured and unpaid, and reviewers must evaluate complex papers quickly, using the writing style as an heuristic.

Currently, LLMs can generate polished, technical prose regardless of the underlying scientific rigor. As a result, writing style is becoming a far less reliable signal of research quality.

The authors interpret this primarily as a warning sign. For decades, editors and reviewers have relied, often subconsciously, on stylistic cues as shortcuts when evaluating submissions. If those cues no longer distinguish strong work from weak work, the risk is that the scientific literature could become saturated with papers that sound convincing but lack substantive contributions.

 

AI and the changing nature of peer review

Reviewers are already adapting, consciously or not, by discounting linguistic polish as a sign of merit. On the one hand, this hints at a potentially positive longer-term transformation. If surface-level writing signals lose their value, the scientific community may be pushed toward more rigorous evaluation practices that focus on methods, evidence, and originality rather than prose. On the other hand, reviewers may also lean on alternative signals, such as author reputation or institutional affiliation, which could unintentionally reinforce existing academic hierarchies.

To cope, journals may eventually deploy AI-assisted “reviewer agents” capable of flagging methodological inconsistencies or verifying claims, though whether such systems will improve rigor or introduce new biases remains uncertain.

 

Overall, LLMs appear to democratize scientific production by boosting output and reducing language barriers while also broadening access to knowledge. At the same time, they erode traditional signals used to judge research quality, potentially flooding the literature with polished but shallow work and, at the same time, paving the way for different evaluation practices.

As AI becomes embedded in scientific practice, institutions—from journals to funding agencies—will need to rethink the review process.


Kusumegi k, et al., Scientific production in the era of large language models. Science, 2025. DOI: 10.1126/science.adw3000