Plagiarism 2.0: battleground Internet


It is a frequently repeated truism that Internet has fundamentally altered the way individuals interact. It is another often-cited truism that the more things change, the more they stay the same. The case of plagiarism supports both versions.

College students “borrowing” from others instead of doing original work is timeless. According to TurnItIn, 30% of all papers turned in contain significant amounts of plagiarism. Instead of standing on the shoulders of giants, these individuals paint their own likeness on the giants’ face, taking credit for the heights reached in the process. This is an ongoing arms race between students who take the dishonest route for a variety of reasons– academic pressure and expectations of peers/parents, writers’ block or more often the burning desire to attend that fraternity/sorority party– and the professors trying to level the playing field and uphold the last remaining scraps of academic integrity in higher education.
As with countless other conflicts, the Internet in this case acts as a neutral arms dealer, arming both sides to the teeth with the latest gadgetry promising to lend an edge over the adversary. Students now have a choice of websites where they can download papers (to use for “inspiration” of course) or even order new work to specification. That’s quite an improvement over the historical status quo when the source of non-original work would have been limited to the immediate social circle of the student. Depending on personal contacts limited both the number of options and increases the risk of getting caught– work previously submitted by classmates may be recognized instanatly as fraud by colleagues in the same department. On the other hand, the defenses have improved too: professors can now take a suspicious submission and Google unusual phrases to check for an obscure uncredited source in cyberspace.

Has the power balance tipped? Two articles published in consecutive Sunday editions of the New York Times sheds some light on this question. In the first article published on Sep 10th, the NYT ran an article describing an experiment with made-to-order paper websites. The authors ordered a paper on the same subject from a sampling of these online businesses and were disappointed overall with the results. (This article titled “At $9.95 a Page, You Expected Poetry?” is available from the Times website with registration.) Quality of the writing was described as mediocre or sophomoric at best. Not exactly the right way to get on the academic fast-track but arguably good news for the purposes of stealth: the incoherent engineer stringing adjectives together like Tom Wolfe would a few eye-brows and draw unwanted attention. At least mediocre writing, is like, dude, that’s cool. But there are interesting questions around maintaining a consistent voice/tone: the professor might be worried if the student suddenly converts from new-age mystic to textbook libertarian. (Or is that explained away naturally by the soul-searching process of liberal education?) For repeat customers one would hope these fabricate-a-paper services are using the same author.

So much for the attackers trying to game the system. The second article takes up the defenders’ point of view, focusing on one particular service called TurnItIn which screens submissions for plagiarism. (The company offers other services including online grading and peer review, but the article focused exclusively on the plagiarism detection.) This web-based service compares a new submission against three sources: a propietary database of articles, a growing collection of work submitted by existing clients and some cache of pages from the wild, wild web. It is sophisticated enough to identify passages which are copied with slight alterations– a necessary capability because cosmetic changes are to be expected. Whether they are unintentional artifacts of copying a passage by hand instead of copy/paste or intentional variations introduced to create a veneer of originality, these “deviations” from the original source are the main challenge for detecting plagiarized text. In keeping with good security engineering we assume the attackers (eg cheating students) know their paper is going to be compared against existing sources and anticipate they will resort to intentional misspellings, changing order of words, substituting synonyms from a thesaurus, switching active/passive voice, even throwing ungrammatical decoy sentences to avoid detection.

If that sounds like basic spammer tectics, that’s because there is a parallel with spam here. Spammers have to contend with filters and craft their message to bypass existing defenses. The attackers’ advantage is that the design of filters allows gray-box testing: for client applications, the spammer can purchase the application and reverse engineer it, and for web services they can register for an account and spam themselves with version of a message until one gets through.

Clearly this has not yet dawned on the ghost-writer-for-hire websites. The Times found that at least one of the papers ordered was promptly exposed as fraud by TurnItIn. That is the next escalation in this conflict: ironically the very openness of the system allows it to be subverted. A more astute competitor would subscribe to anti-plagiarism services and verify that their work does not raise any red flags or tweak it until it flies under the radar, charging a few more dollars extra to students for that guarantee and upping the ante for the defenders.

And so the arms race continues.

cemp

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s