1000/1000
Hot
Most Recent
SCIgen is a paper generator that uses context-free grammar to randomly generate nonsense in the form of computer science research papers. Its original data source was a collection of computer science papers downloaded from CiteSeer. All elements of the papers are formed, including graphs, diagrams, and citations. Created by scientists at the Massachusetts Institute of Technology, its stated aim is "to maximize amusement, rather than coherence." Originally created in 2005 to expose the lack of scrutiny of submissions to conferences, the generator subsequently became used, primarily by Chinese academics, to create large numbers of fraudulent conference submissions, leading to the retraction of 122 SCIgen generated papers and the creation of detection software to combat its use.
Opening abstract of Rooter: A Methodology for the Typical Unification of Access Points and Redundancy:[1]
In 2005 a paper generated by SCIgen, Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, was accepted as a non-reviewed paper to the 2005 World Multiconference on Systemics, Cybernetics and Informatics (WMSCI) and the authors were invited to speak. The authors of SCIgen described their hoax on their website, and it soon received great publicity when picked up by Slashdot. WMSCI withdrew their invitation, but the SCIgen team went anyway, renting space in the hotel separately from the conference and delivering a series of randomly generated talks on their own "track". The organizer of these WMSCI conferences is Professor Nagib Callaos. From 2000 until 2005, the WMSCI was also sponsored by the Institute of Electrical and Electronics Engineers. The IEEE stopped granting sponsorship to Callaos from 2006 to 2008.
Submitting the paper was a deliberate attempt to embarrass WMSCI, which the authors claim accepts low-quality papers and sends unsolicited requests for submissions in bulk to academics. As the SCIgen website states:
Computing writer Stan Kelly-Bootle noted in ACM Queue that many sentences in the "Rooter" paper were individually plausible, which he regarded as posing a problem for automated detection of hoax articles. He suggested that even human readers might be taken in by the effective use of jargon ("The pun on root/router is par for MIT-graduate humor, and at least one occurrence of methodology is mandatory") and attribute the paper's apparent incoherence to their own limited knowledge. His conclusion was that "a reliable gibberish filter requires a careful holistic review by several peer domain experts".[2]
The pseudonym "Herbert Schlangemann" was used to publish fake scientific articles in international conferences that claimed to practice peer review. The name is taken from the Swedish short film Der Schlangemann.
In all cases, the published papers were withdrawn from the conferences' proceedings, and the conference organizing committee as well as the names of the keynote speakers were removed from their websites.
Refereeing performed on behalf of the Institute of Electrical and Electronics Engineers has also been subject to criticism after fake papers were discovered in conference publications, most notably by Labbé and a researcher using the pseudonym of Schlangemann.[18][19][20][21][22][23]
Cyril Labbé from Grenoble University demonstrated the vulnerability of h-index calculations based on Google Scholar output by feeding it a large set of SCIgen-generated documents that were citing each other, effectively an academic link farm, in a 2010 paper. Using this method the author managed to rank "Ike Antkare" ahead of Albert Einstein for instance.[24]
In 2013, over 122 published conference papers created by SCIgen were retracted by Springer and the IEEE, unlike previous submissions that were intended to be pranks, these submission were largely made by Chinese academics, who were using SCIgen papers to boost their publication record.[25]
In 2015, SciDetect was released by Springer. This software, developed by Cyril Labbé, is designed to automatically detect papers generated by SCIgen.[26]
In 2021, a study was published on 243 SCIgen papers that had been published in the academic literature. They found that SCIgen papers made up 75 per million papers (< 0.01%) in information science, and that only a small fraction of the detected papers had been dealt with.[27][28]