Does Swimming in the AI Data Lake risk sinking in the mud?

Margaret DeLacy Discussion

[Submitted by Margaret DeLacy, acting as subscriber]


Yesterday's Scholarly Kitchen blog carried an opinion by Roy Kaufman entitled,"Swimming in the AI Data Lake: Why Disclosure and Versions of Record Are More Important than Ever."

Kaufman points out that AI services such as Chat GPT are only as good as the materials they are trained on.  If an application is ctitical, it is important to train the program on the most reliable sources ("versions of record").

 I'd add that presumably at some point a human needs to make the call about how reliable a given item/source really is....whether this gatekeeping is carried out by reviewers, journal editors, government entities or some other "decider in chief" And then someone else needs to create a process for checking up on those gatekeepers.  For example, ensuring that the articles used weren't from a predatory journal and the reviewers were qualified and free from conflicts of itnerest. We already know that citation counts and "impact factors" are not trustworthy guides to reliability.

Scholars are unkiley to become obsolete any time soon.