Plagiarism Detection Archives - HackerRank Blog

Maintaining a Level Playing Field: HackerRank’s Commitment to Assessment Integrity

HackerRank — Tue, 07 May 2024 12:45:42 +0000

At HackerRank, our mission is to change the world to value skills over pedigree. We believe every developer, regardless of background or location, deserves a fair, equal opportunity to showcase their abilities. Our commitment to maintaining the integrity of the entire hiring process ensures that every candidate is evaluated solely on their true skills. Nothing more, nothing less.

What does “Fair” mean at HackerRank?

For us, ‘Fair’ means ensuring that every candidate gets an unbiased chance to showcase their skills. Whether you’re coding live or solving complex algorithms, fairness is at the heart of every assessment. At HackerRank we embrace AI and encourage using it appropriately. In our assessments, AI tools are flagged if their use goes beyond what’s allowed, ensuring that the evaluation reflects genuine skills.

Preserving integrity and protecting fairness

The online nature of hiring assessments brings with it potential challenges, such as impersonation and unauthorized aid. But with HackerRank, we mitigate these risks through advanced tools and processes. Here’s how.

Monitoring suspicious environments

Physical surroundings are a common way candidates might try to gain an unfair edge, whether by using another device or having someone nearby offer help. HackerRank’s tools like Multiple monitor detection and Image proctoring keep an eye on the surroundings to ensure the integrity of the test.

Multiple monitor detection: This feature spots if an external monitor is connected, helping us ensure the candidate stays focused on their work.
Image proctoring: We capture images of the candidate and their environment during the test, giving hiring teams the context they need to flag suspicious activity.
Image analysis: Through facial recognition and environmental checks, our algorithms detect any anomalies, like extra people or devices, and flag them for review.

Impersonation prevention

No one wants to hire the wrong person because someone else took the test for them. HackerRank ensures that the right person is taking the test by using Photo Identification and Facial Recognition throughout the assessment.

Photo Identification: Before the test, we capture a photo of the candidate to establish their identity. This is checked periodically during the test to prevent impersonation.
Image Proctoring: Capturing images at regular intervals helps detect any changes in appearance or unauthorized help during the assessment.

Fairness in the face of AI: Detecting AI usage

The rise of AI tools like ChatGPT has added a new dimension to assessment integrity. While AI can assist in coding, it can also undermine the fairness of tests if misused. HackerRank’s latest Plagiarism Detection ensures that candidates are evaluated on their true abilities without undue AI interference.

AI-Powered plagiarism detection: Analyzes patterns in code submissions to detect unauthorized AI usage or copying. This keeps the playing field fair for everyone.
Copy-Paste tracking: Detects when code is pasted from an external source, helping hiring teams assess if candidates are relying too heavily on pre-written solutions.
Tab switch proctoring: Monitoring when candidates switch tabs helps us identify if they’re seeking external help or using unauthorized resources during the assessment.

Preventing test content leaks

Content leaks can give candidates an unfair advantage by exposing questions or solutions beforehand. HackerRank uses a variety of tools to combat this:

Shuffled sections & questions: We randomize the order of questions for each test-taker, reducing the risk of content being shared.
Watermarking: Each question carries a watermark tied to the candidate’s email, deterring them from sharing the test content.
Hiding question labels: We hide the titles of questions to prevent candidates from easily searching for answers online.
AI solvable question labels: Our library questions are labeled according to their potential solvability by AI assistants. Armed with this information, hiring teams can filter out questions that may be susceptible to AI-generated solutions.

A balanced approach to AI in Assessments

HackerRank doesn’t reject AI; instead, we embrace it with balance. If AI usage isn’t intended, our system flags it ensuring that assessments are genuine and reflective of a candidate’s real abilities. This approach is designed to ensure that the hiring process remains transparent, fair, and focused on actual skills.

Leveling the playing field for everyone

Fairness isn’t just about preventing shortcuts. It’s about making sure every candidate, regardless of where they’re from or what tools they have, gets the same opportunity to shine. By upholding the highest standards of integrity, we help companies find the right talent based on skill, not external advantages.

If you’re ready to see how HackerRank can help your team uphold the highest standards of fairness and integrity, reach out to us for a demo. Let’s build a more equitable future for technical hiring, together.

Frequently Asked Questions

Can your plagiarism detection system detect AI-generated code?
Yes, our system is designed to flag suspicious use of AI tools, ensuring that assessments remain genuine and reflective of the candidate’s skills.

Does your plagiarism detection automatically disqualify candidates?
No. Our system flags potential cases of plagiarism, but the final decision lies with the hiring team, who can review the flagged incidents in detail.

Who can I contact if I have more questions?
For more information on how we maintain integrity in assessments, feel free to reach out to your customer success manager or contact us at support@hackerrank.com.

The post Maintaining a Level Playing Field: HackerRank’s Commitment to Assessment Integrity appeared first on HackerRank Blog.

Should Developers Be Able to Use AI Tools During Coding Tests?

April Bohnert — Tue, 03 Oct 2023 12:45:53 +0000

Coding tests play a pivotal role in tech recruiting, shining a spotlight on the prowess of each developer. These assessments are tailored to mirror real-world challenges, ensuring that a candidate isn’t just versed in theory but can truly bring code to life in practical scenarios.

But those real-world scenarios are evolving.

With the advent of AI tools like ChatGPT and GitHub Copilot, we’re witnessing a profound shift in the development landscape. Just as developers once leaned heavily on StackOverflow or turned to Google for quick insights, they now frequently consult these AI companions for guidance and optimization. In fact, 82% of developers now use AI tools in their development process.

This transformation begs a pressing question for hiring managers and tech recruiters: If AI tools have become so intrinsic to modern development, shouldn’t our coding assessments adapt to this new standard? The debate on allowing access to Google during tests has been around for a while, but introducing AI into the mix adds a fresh, more nuanced dimension to the conversation.

Every company that hires developers will have to grapple with this question. And it’s not a “let’s-put-it-off-for-later” kind of issue. The answer could redefine tech hiring as we know it.

The Changing Nature of Development

Gone are the days when a developer’s world was limited to their integrated development environment (IDE), a few chosen frameworks, and perhaps a sprinkling of API documentation kept open in a browser tab. Today, software development is more expansive and dynamic, and AI tools are making a recognizable mark on it.

Consider GitHub Copilot, for instance. It’s not just an auto-suggest tool that helps developers complete a line of code. It’s a co-pilot in the truest sense, offering solutions, predicting the next line, and sometimes even educating developers on best practices. Imagine being mid-way through a tricky function and having a tool that doesn’t just help you complete it but suggests an optimized way to achieve the same outcome. It’s like having a seasoned developer whispering expert advice in your ear.

And then there’s ChatGPT. Let’s say a developer is grappling with a peculiar bug, and the usual forums don’t have the answer. ChatGPT is there, ready to brainstorm and debug with them, actively contributing to the problem-solving session.

These examples aren’t mere hypotheticals; they reflect the evolving day-to-day reality of developers. According to a recent Github survey, 70% of developers say AI coding tools will offer them an advantage at work and expect better code quality, completion time, and resolving incidents to be the greatest benefits. By seamlessly integrating AI tools into their workflow, they can be more efficient, explore multiple solutions, and even learn on the job. It’s not about replacing human skills or intuition; it’s about enhancing them.

Now, this doesn’t mean every line of code a developer writes will be assisted by AI. But it does indicate a shift in the ecosystem. As developers continue to integrate these tools into their repertoire, the boundary between human expertise and AI-enhanced skills becomes a bit fuzzy.

For hiring managers and tech recruiters, this raises an exciting yet challenging question. How do you discern a developer’s core skills from their proficiency in working with AI tools? And, more importantly, should there even be a distinction?

The Real Question for Hiring Teams

The crux of the matter isn’t just about the tools developers have at their disposal or how the art of coding is evolving. It boils down to two central questions:

What skills are we truly trying to assess?
How do we ensure that our tests are still relevant in the face of innovation?

Let’s draw a parallel. A decade or so ago, a common debate in tech recruitment circles revolved around allowing candidates to use Google during coding assessments. The crux of that debate was clear: in the real world, developers wouldn’t be restricted from accessing resources. Why then create an artificial barrier in tests? Fast-forward to today, and we’re facing a similar predicament, albeit on a more sophisticated scale.

If we recognize that AI tools are not just auxiliary aids but integral parts of a developer’s toolkit, then the debate shifts. It’s no longer about whether candidates can use AI tools like ChatGPT during assessments; it’s about whether they should.

To illuminate the point further: If a developer’s daily job involves collaborating with an AI tool to optimize workflows, debug more effectively, or generate parts of code, shouldn’t their proficiency in using these tools be part of what we assess? It’s akin to evaluating a carpenter not just on their ability to hammer a nail but also on their skill in using a modern nail gun.

This is the real dilemma facing hiring managers and tech recruiters. In an era where the tools of the trade are in flux, the challenge is to craft assessments that capture both the timeless essence of coding and the contemporary nuances brought about by AI.

Approaches to Integrating AI in Coding Tests: Pros and Cons

As hiring teams grapple with the evolving role of AI in development, they’re presented with a range of options on how to incorporate these tools into their assessment process. Each approach comes with its own set of advantages and challenges. The key thing to remember is that creating an AI strategy isn’t about finding the right answer. Rather, the goal is to embrace AI on your own terms with an approach that works for your organization.

Let’s break down the primary strategies.

Prevent the Use of AI

In this approach, candidates are given a traditional coding environment without access to external AI tools. It’s the old-school method where one’s coding chops are tested in isolation. Proctoring tools are often employed to ensure the candidate isn’t accessing external resources, and plagiarism detection systems are on the lookout for copy-pasted solutions.

Pros:

Clarity of Assessment: You’re certain that solutions stem from the candidate’s raw knowledge and skills.
Standardization: All candidates face the same conditions, ensuring fairness.

Cons:

Unrealistic Scenario: It might not fully capture the nuances of a real-world coding job where all tools are accessible.
Missed Skill Evaluation: The approach might overlook a developer’s proficiency in working with AI tools.

Allow Limited Use of AI

In this scenario, you might have developers work in a controlled environment where the IDE comes with a built-in AI assistant, acting as a pair programmer. This assistant can suggest optimizations or guide the test-taker through complex problems. However, external AI tools or search engines remain off-limits. Proctoring tools monitor the test-taking process and how candidates work with the AI assistant. Meanwhile, plagiarism detection tools watch for instances where candidates receive unauthorized external help.

Pros:

Relevant Skill Assessment: This mirrors a modern development workflow, assessing the synergy between the developer and AI.
Controlled Environment: The built-in AI ensures candidates have a standardized AI experience.

Cons:

Gray Areas: Defining “limited use” might pose challenges, leading to assessment discrepancies.
Balancing Act: You could run the risk of candidates leaning too heavily on the AI, making it hard to evaluate their independent skills.

Allow Complete Use of AI

Here, candidates are let loose in a fully-equipped digital playground, complete with AI tools like GitHub Copilot or ChatGPT. The assessment evaluates not just the final solution but the process — how effectively a candidate collaborates with AI. To counter potential misuse, a variety of advanced plagiarism detection systems work in tandem.

Pros:

Holistic Evaluation: Recognizes the full spectrum of modern coding, from raw skills to AI-enhanced development.
Push for Innovation: With AI at their side, candidates might come up with out-of-the-box solutions.
Real-World Environment: This is as practical as it gets, with developers solving problems the exact same way they would on the job.

Cons:

Attribution Challenges: Discerning the candidate’s contribution versus AI’s could be tricky.
Integrity Concerns: With more tools available, ensuring authentic solutions becomes paramount.

Each of these approaches brings forth a distinct vision of what coding assessments should look like in the age of AI. It’s worth noting that companies can use a combination or blend of these approaches in their hiring process. For example, a company could go with a more restrictive approach for initial screening assessments, and then allow for open use of AI for a smaller candidate pool in the interview process.

Upholding Assessment Integrity in the Age of AI

The integrity of coding assessments is a cornerstone of effective tech recruitment. In a world where AI tools can significantly influence the output, ensuring that a candidate’s work is genuine, original, and indicative of their skills becomes paramount — regardless of how you decide to assess their skills. However, the methods used to secure the integrity of assessments will look different for every company, depending on how they choose to embrace AI.

Leveraging Proctoring Tools

In scenarios where you either prevent or limit the use of AI, using proctoring tools becomes essential. These tools can monitor a candidate’s screen, browser tabs, and even their webcam to ensure that they aren’t accessing unauthorized resources. Modern proctoring software has grown sophisticated enough to detect suspicious behavior and flag it for review, ensuring a fair testing environment.

Investing in Plagiarism Detection

Monitoring for plagiarism has always been essential in coding tests. However, the introduction necessitates a greater focus on plagiarism detection.

Before the spread of AI, the industry standard for plagiarism detection relied heavily on MOSS code similarity. In addition to producing higher false positives rates, this approach also unreliably detects plagiarism originating from conversational agents like ChatGPT. That’s because ChatGPT can produce somewhat original code, which can circumvent similarity tests.

The new industry standard for securing tests is an AI-powered plagiarism detection system. HackerRank’s AI model – which is currently the only one in the market – can track dozens of signals across three categories — coding behavior features, attempt submission features, and question features. And it can analyze them to calculate the likelihood of suspicious activity. This upholds transparency, fairness, and equity, regardless of how integrated AI is into the testing process.

Educating the Candidates

Finally, setting clear expectations is crucial. Before the assessment, candidates should be thoroughly briefed about the tools they can use, the expectations regarding collaboration with AI, and the consequences of unfair practices. An informed candidate is less likely to breach assessment integrity.

By combining technology with transparent communication, companies can navigate the challenges posed by AI in coding assessments. The goal remains unchanged: to accurately gauge a candidate’s skills in an environment that’s both fair and indicative of real-world scenarios.

Embracing the Future of Coding Assessments

As technical skills evolve, so too must our methods of evaluating technical talent. The rise of AI tools like ChatGPT and GitHub Copilot isn’t merely a passing trend; it signifies a shift in how developers approach their craft. As such, the debate over their inclusion in coding tests is more than just a pedagogical question — it’s a reflection of the changing definition of what it means to be a developer.

For hiring managers and tech recruiters, the challenge lies in balancing tradition with innovation. The decision isn’t binary; as highlighted, there’s a spectrum of approaches, each with its merits.

Whatever path companies choose, the core principle remains the same: assessments should be a genuine reflection of on-the-job skills and scenarios. AI is undeniably a part of that picture now. But, as with all tools, it’s about how you use it.

This article was written with the help of AI. Can you tell which parts?

The post Should Developers Be Able to Use AI Tools During Coding Tests? appeared first on HackerRank Blog.

AI Can Pass (Some) Test Questions. Now What?

Matt McDougall — Wed, 19 Jul 2023 14:13:39 +0000

What’s going on?

Since ChatGPT came onto the scene in late 2022, test after test has proven vulnerable to the wiles of generative AI. The initial GPT-3.5 model was impressive enough, and the more advanced GPT-4 has shown an even greater proficiency for test-taking. Name a large, well-known test, and ChatGPT has probably passed it. In addition to bar exams, SATs, and AP exams, ChatGPT has also passed 9 out of 12 AWS certification exams and Google’s L3 engineer coding interview.

At HackerRank, we’ve seen firsthand how AI can bypass MOSS code similarity, the industry standard for coding plagiarism detection.

All of these sudden vulnerabilities can seem scary for those administering tests. How can you trust the answers you’re getting? If your tests rely heavily on multiple choice questions, which are uniquely vulnerable to large language models, how can you revise test content to be more AI resistant?

These developments are worrying for test-takers, as well. If you’re taking a test in good faith, how can you be sure you’re getting a fair shake? Interviewing is stressful enough without having to wonder if other candidates are seeking an AI-powered advantage. Developers deserve the peace of mind that they’re getting a fair shot to showcase their skills.

What’s our stance?

At HackerRank, we’ve done extensive testing to understand how AI can disrupt assessments, and we’ve found that AI’s performance is intrinsically linked with question complexity. It handles simple questions easily and efficiently, finds questions of medium difficulty challenging, and struggles with complex problems. This pattern parallels most candidates’ performance.

However, creating increasingly intricate questions to outwit AI isn’t a sustainable solution. Sure, it’s appealing at first, but it’s counterproductive for a few reasons.

First, this could potentially compromise the core value of online assessments, weakening the quality of talent evaluation. More complex questions don’t automatically translate into better signals into a candidate’s skills. They take longer to answer, which translates into either longer assessments, or fewer questions (and fewer signals to evaluate).
Second, it would certainly degrade the candidate experience by focusing on frustrating AI rather than on giving developers a chance to showcase their skills. Losing sight of the developer experience tends to diminish that experience, which could result in more candidates dropping out of the pipeline.
Third, it would set up a game of perpetual leapfrog as more advanced AI models solve more complex problems, and even more complex problems are created to trip up more advanced AI.

Instead, our focus remains on upholding the integrity of the assessment process, and thereby ensuring that every candidate’s skills are evaluated fairly and reliably.

Introducing our new AI solvability indicator

Upholding integrity means being realistic—and transparent. This means acknowledging that there are assessment questions that AI can solve. And it means alerting you when that is the case, so you can make informed decisions about the content of your assessments.

That is why we are introducing an AI solvability indicator.

This indicator operates on a combination of two criteria.

Whether or not a question can be fully solved by AI.
Whether or not that solution is picked up by our AI-powered plagiarism detection.

If a question is not solvable by AI, it does not get flagged. Likewise, if a question is solvable, but the answer triggers our plagiarism detection model, it does not get flagged. The question may be solvable, but plagiarism detection ensures that the integrity of the assessment is protected.

If a question is solvable by AI and the solution evades plagiarism detection, it will get flagged as AI Solvable: Yes. Generally, these questions are simple enough that the answers don’t generate enough signals for plagiarism detection to be fully effective.

Questions flagged as AI solvable will be removed from certified assessments, but may still appear in custom assessments, particularly if those assessments have not been updated in some time.

If you’re browsing through questions, you can also select to hide all AI-solvable questions, just as you can hide all leaked questions.

What else is HackerRank doing?

Beyond the transparency of the AI solvability indicator, we are building in measures to actively ensure assessment integrity. These include:

AI-powered plagiarism detection. Our industry-first, state-of-the-art plagiarism detection system analyzes dozens of signals to detect certain out-of-bounds behavior. With an incredible 93% accuracy rate, our system repeatedly detects ChatGPT-generated solutions, even when they’re typed in by hand, and even when they easily bypass standard detection methods.
Certified assessments. Let us handle assessment maintenance. Our certified assessments are out-of-the-box tests curated and maintained by HackerRank experts. We take on all the upkeep, including keeping content current and flagging and replacing any leaked or AI-solvable questions.
Expanded question types. We’re expanding question types with formats and structures that are more resistant to AI solutions, such as projects and code repositories. These have the added benefit of being extremely close to the real-world environments and challenges your candidates would face in their daily work, giving you a true-to-life evaluation of their skills.

What can you do?

No matter where your company stands on AI, we believe it’s best to be transparent about its capabilities. Yes, AI can solve simpler technical assessment questions. We prefer you to know that so that you can take informed actions.

So what can you do? Every company is coming at AI in their own way, so there’s no one right answer. What works for one organization may not work for another. But broadly speaking, here are some steps you should consider to protect the integrity of your assessments.

Stay informed. Yes, some technical questions can be solved by AI. At HackerRank, we help ensure assessment integrity through our market leading plagiarism detection and through solvability indicators that give you the transparency you need to deliver fair assessments.
Replace solvable questions. When a question in one of your assessments is flagged as AI solvable, a simple course of action is to replace it with an unsolved question from our library. We also recommend looking at the type of question you’re asking, and what you’re hoping to learn from it. It may make sense to replace a solvable question with an entirely different question type.
Embrace new question types. Newer question formats like projects and code repos are more resistant to AI, and their close resemblance to real-world scenarios gives you a truer-to-life evaluation of how a candidate would perform in their daily work.
Take advantage of certified assessments. Don’t want to deal with maintaining and updating assessments? Let us do it for you. With certified assessments, HackerRank experts handle all of the content curation and monitoring, including replacing any leaked or AI solvable questions.
Leverage HackerRank professional services. Have special needs for your assessments? Engage our experts for monitoring and content creation customized to your specific business objectives.

Ensure assessment fairness and your own peace of mind

Ensuring assessment integrity in a time of rapidly advancing AI can seem difficult. You can only dial up question complexity so far before it starts to degrade the assessment experience and even compromise the value of assessments in finding qualified talent. That’s why we’re focused on reinforcing key pillars of assessment integrity, including our industry-leading AI-powered plagiarism detection, certified assessments, and solvability indicators that give you the transparency and signals you need to make the best decisions about your assessments.

Be sure to check out our plagiarism detection page to go into more detail about how HackerRank is ensuring assessment integrity.

The post AI Can Pass (Some) Test Questions. Now What? appeared first on HackerRank Blog.

ChatGPT Easily Fools Traditional Plagiarism Detection

Matt McDougall — Wed, 14 Jun 2023 14:00:27 +0000

25% of technical assessments show signs of plagiarism.

While it’s impossible for companies to fully prevent plagiarism—at least without massively degrading the candidate experience—plagiarism detection is critical to ensuring assessment integrity. It’s important that developers have a fair shot at showcasing their skills, and that hiring teams have confidence in the test results.

And the standard plagiarism detection method used by, well, everyone, is MOSS code similarity.

MOSS Code Similarity

MOSS (Measure of Software Similarity) is a coding plagiarism detection system developed at Stanford University in the mid-1990s. It operates by analyzing the structural pattern of the code to identify similarity, even when identifiers or comments have been changed, or lines of code rearranged. MOSS is incredibly effective at finding similarities, not just direct matches, and that effectiveness has made it the de facto standard for plagiarism detection.

That doesn’t mean MOSS is flawless, however. Finding similarity doesn’t necessarily translate to finding plagiarism, and MOSS has a reputation for throwing out false positives, particularly when faced with simpler coding challenges. In our own internal research, we’ve found false positive rates as high as 70%.

AI changes the game

While not perfect, MOSS has been a “good enough” standard for years. Until the advent of generative AI tools like ChatGPT.

ChatGPT has proven effective at solving easy to medium difficult assessment questions. And with just a bit of prodding, it’s also effective at evading MOSS code similarity. Let’s see it in action:

Step 1: We asked ChatGPT to answer a question and it did so, returning a solution as well as a brief explanation of the rationale.

Step 2: Next, we directly asked ChatGPT to help escape MOSS code similarity check, and it refused.

Step 3: However, with some creative prompting, ChatGPT will offer unique approaches. And the way that ChatGPT’s transformer-based model works, it generates distinct answers every time, giving it a huge advantage in bypassing code similarity detection.

Here are three different prompts and three totally different approaches. Note that ChatGPT transforms many variable names from the initial solution to evade code similarity checks.

Step 4: The moment of truth! When we submitted the revised answer through plagiarism detection, it passed cleanly.

What’s the implication?

Basically, MOSS code similarity checks can be easily bypassed with ChatGPT.

Time to panic?

If MOSS code similarity can be bypassed, does that mean that technical assessments can no longer be trusted?

It depends.

On one hand, it’s easier for candidates to bypass the standard plagiarism check that the entire industry has relied upon. So, yes, there is a risk to assessment integrity.

On the other hand, plagiarism detection has always been a compromise between effectiveness and candidate experience. MOSS is not intrusive, but its high false positive rates render it less definitive than it could be. Ultimately, it’s not really detecting plagiarism. It’s detecting patterns in the code that could be plagiarism.

Move over, MOSS

What happens now?

Plagiarism detection gets rethought for the AI era. Expect companies to scramble for better versions of MOSS, more complex questions, different question types, and more to make up the difference.

At HackerRank, we’ve taken a different approach. While we’re always improving our question library and assessment experience, we’ve completely rethought plagiarism detection. Rather than relying on any single point of analysis like MOSS Code Similarity, we built an AI model that looks at dozens of signals, including aspects of the candidate’s coding behavior.

Our advanced new AI-powered plagiarism detection system boasts a massive reduction in false positives, and a 93% accuracy rate. In real-world conditions, our system repeatedly detects ChatGPT-generated solutions, even when those results are typed in manually, and even when they easily pass MOSS Code Similarity.

What happens when the example shown above gets submitted through our new system? It gets flagged for suspicious activity.

Clicking into that suspicious activity reveals that our model identified the plagiarism due to coding behaviors.

What’s more, hiring managers can replay the answer keystroke by keystroke to confirm the suspicious activity.

There’s nothing even close to it on the market, and what’s more, it’s a learning model, which means it will only get more accurate over time.

Want to learn more about plagiarism detection in the AI era, MOSS Code Similarity vulnerability, and how you can ensure assessment integrity? Let’s chat!

The post ChatGPT Easily Fools Traditional Plagiarism Detection appeared first on HackerRank Blog.

HackerRank’s AI-Powered Plagiarism Detection Ensures Assessment Integrity in the ChatGPT Era

Matt McDougall — Wed, 07 Jun 2023 16:02:04 +0000

What you need to know

HackerRank has just launched its advanced plagiarism detection system, powered by AI. Designed to protect assessment integrity while ensuring developers have a fair and level playing field to showcase their skills, this system uses dozens of signals to detect suspicious behavior, including the use of external tools.

The revolution will be prompted

AI is here, and it’s not going anywhere. 82% of developers have already experimented with some type of AI tool, and 55% are using AI assistants at work. AI is redefining what it means to be a developer, and in the future most code will be written with some kind of AI support.

AI opens up all kinds of exciting possibilities, but it can also muddy the waters of technical assessments. GPT-4 can not only pass AP exams and simpler coding challenges; it can also bypass MOSS code similarity, which has long been the industry standard for coding plagiarism detection.

How can you ensure assessment integrity? If you’re relying on MOSS code similarity, you can’t. Not anymore. That means you can’t place full confidence in a candidate’s skills, and you can’t assure developers that they’re showcasing their skills on a level playing field.

Fighting magic with magic

Basing any kind of detection system around AI usage would be a futile endeavor. AI is advancing so rapidly that keeping said detection systems ahead of it would be nearly impossible.

Rather than relying on any single point of analysis like MOSS Code Similarity or AI usage, we took a different path and built an AI model that looks at dozens of signals, including aspects of the candidate’s coding behavior. This “defense in breadth” compensates for signals that may be bypassed on an individual basis and gives our new plagiarism detection system a more holistic way to detect shenanigans.

Think of it like a security system. A system that relies on a single factor, like a fingerprint scanner, is secure until that single factor can be bypassed. A multi-factor system that employs x-ray scanners, metal detectors, facial recognition, and gait analysis is far more challenging to sneak past.

Plagiarism detection, powered by AI

Our plagiarism detection system tracks dozens of signals across three categories—coding behavior features, attempt submission features, and question features—and analyzes them to calculate the likelihood of suspicious activity.

After all of that crunching, our model predicts plagiarism suspicion as either High, Medium, or No.

Importantly, we do not use any personal data such as gender, race, age, school, location, or experience in our analysis model.

Currently, our advanced plagiarism detection system achieves an incredible 93% accuracy rate. In real-world conditions, our system repeatedly detects ChatGPT-generated solutions, even when those results are typed in manually, and even when they easily pass MOSS Code Similarity.

What happens when a ChatGPT-generated answer gets submitted through our new system? It gets flagged for suspicious activity.

Clicking into that suspicious activity reveals that our model identified the plagiarism due to coding behaviors.

What’s more, hiring managers can replay the answer keystroke by keystroke to confirm the suspicious activity.

There’s nothing even close to it on the market, and what’s more, it’s a learning model, which means it will only get more accurate over time.

Ensure assessment integrity and candidate confidence

Our new AI-powered plagiarism detection is a groundbreaking innovation. It protects assessment integrity and ensures a fair playing field for developers to showcase their skills, and there’s nothing else out there that comes close.

As the AI revolution reshapes the industry, it’s vital to have reliable and efficient methods to detect and prevent plagiarism in online assessments. By analyzing various aspects of coding behavior and offering 93% detection accuracy, this system sets a new standard for maintaining transparency, fairness, and equity.

Want to go deeper and see our new plagiarism detection in action? We’d be happy to show you around.

The post HackerRank’s AI-Powered Plagiarism Detection Ensures Assessment Integrity in the ChatGPT Era appeared first on HackerRank Blog.

How Plagiarism Detection Works at HackerRank

HackerRank — Thu, 16 Mar 2023 14:18:17 +0000

Preventing plagiarism in online assessments has always been important. But the widespread availability of AI tools has reinforced the need for plagiarism strategies that ensure all developers have an equal shot at landing job opportunities that match their unique skill sets and professional aspirations.

HackerRank’s mission is to accelerate the world’s innovation by focusing hiring decisions on skill, not pedigree. We do so by giving all developers the opportunity to showcase their skills in a fair and equitable testing environment. The integrity of the questions that comprise these coding tests is critical for developers and employers to feel confident in their fairness and efficacy.

We’ve found that a proactive plagiarism prevention and detection policy is the best approach for combating plagiarism, ensuring the efficacy of our tests, and providing a fair way for all developers to demonstrate their skills.

HackerRank’s Plagiarism Strategy

Assessment integrity at HackerRank has three core pillars: proctoring tools, plagiarism detection, and DMCA takedowns.

Proctoring Tools

One important component of ensuring assessment integrity is to build systems that provide the right proctoring capabilities. Our approach to proctoring is to capture a variety of behavioral signals, including tab proctoring, copy-paste tracking, image proctoring, and image analysis.

The purpose of proctoring is twofold. First, proctoring tools help prevent plagiarism by acting as a deterrent. Candidates who know that proctoring is in place are less likely to engage in such activity. Second, proctoring tools record data points that support plagiarism detection.

Plagiarism Detection

In addition to proctoring tools, the integrity of an assessment also relies on plagiarism detection. In other words, the ability to flag when a candidate likely received outside help.

The current industry standard for plagiarism detection relies heavily on MOSS code similarity. Not only does this approach often lead to higher false positives rates, but it also unreliably detects plagiarism originating from conversational AI or large language models. That’s because conversational AI can produce original code, which circumvents similarity tests.

Instead, HackerRank uses a machine-learning based plagiarism detection model to characterize coding patterns and check for plagiarism based on a number of signals. The model also uses self-learning to analyze past data points and continuously improve its confidence levels.

The result is a new ML-based detection system that is three times more accurate at detecting plagiarism than traditional code similarity approaches—and can detect the use of external tools such as conversational AI. This dramatically reduces the number of false positive plagiarism flags and ensures all developers are being judged in a fair and equitable testing environment.

DMCA Takedowns

The Digital Millennium Copyright Act (DMCA) is a United States copyright law that provides a legal framework for how copyright owners, online service providers, and users engage with copyrighted content. A DMCA takedown is when a copyright holder requests a website or online community to remove content that they believe infringes on their intellectual property.

DMCA isn’t a perfect system, and we recognize there are some drawbacks to pursuing a takedown policy. However, we’ve found that a proactive DMCA policy is necessary to minimize the spread of leaked questions, combat plagiarism, and provide a fair way for all developers to demonstrate their skills.

Accordingly, our DMCA approach centers on:

Ensuring a fair hiring opportunity for every developer by reducing plagiarism and upholding question integrity.
Conducting an intensive manual review process to validate claims, with particular care taken to protect open source and developer communities from mistaken requests.

Through an extensive review process, we identify, review, and request the takedown of content we believe to be question leaks. Reducing the number of leaked questions reduces the opportunity for candidates to commit plagiarism through the use of leaked solutions.

What Does Our Plagiarism Flag Mean for You?

If our detection system identifies a potential case of plagiarism, it issues a plagiarism flag, which indicates that the candidate might have copied their code or solution. We recommend that hiring teams conduct a manual review of the flagged code to ensure a false positive doesn’t disqualify an honest candidate. We recommend hiring teams refrain from auto-rejecting a candidate based on the plagiarism flag. Ultimately, the decision on how to respond to a plagiarism flag is up to hiring teams, and specific policies will vary with each employer.

Frequently Asked Questions

Can Your Plagiarism Detection System Detect Code From ChatGPT?

Yes. Our AI-enabled plagiarism detection system feeds several proctoring and user-generated signals into an advanced machine-learning algorithm to flag suspicious behavior during an assessment. By understanding code iterations made by the candidate, the model can detect if they copied and pasted code from an external source. However, it isn’t possible to identify what source the candidate used to obtain or create the code.

Does Your Plagiarism Detection System Automatically Fail Candidates?

No. Our detection system identifies potential cases of plagiarism and empowers hiring teams to decide if it’s an actual case of plagiarism.

I Still Have Questions About Plagiarism. Who Should I Contact?

If you’re a customer looking for support on plagiarism and its impact on your business, you can contact your customer success manager or our team at support@hackerrank.com.

The post How Plagiarism Detection Works at HackerRank appeared first on HackerRank Blog.

What Is ChatGPT? And What Does It Mean for Technical Hiring?

HackerRank — Fri, 20 Jan 2023 16:08:07 +0000

Since its public debut in November, ChatGPT has taken the world by storm. In only five days, it surged to one million users. In just over a month, the valuation of the company behind it, OpenAI, grew to $29 billion.

Across sectors, there’s a growing chorus of questions about the implications of large language models (LLMs) like ChatGPT. Will these AI-enabled tools change education and make essay writing obsolete? Can they generate creative enough ideas to power mainstream ad campaigns? Will tools like ChatGPT provide a viable alternative to traditional search engines?

We’re asking some equally big questions ourselves: How well can ChatGPT actually code? And what impact will LLMs have on the broader world of computer programming?

AI-powered innovation like ChatGPT is poised to fundamentally change the relationship between developers and coding, including how employers assess technical skills and hire developers. With that in mind, we dove deep into the details of ChatGPT, its impact on skill assessments, and what its development means for the future of technical hiring.

Key Takeaways:

The coding potential of LLMs has reinforced the need for strategies and tools for upholding the integrity of coding assessments.
Strong proctoring tools and plagiarism detection systems have become essential, and can help protect even solvable questions.
Employers should avoid multiple choice questions and problems that have answers so short that a plagiarism detection system can’t detect when a candidate has received help from a tool like ChatGPT.
Continued growth of artificial intelligence will redefine the real-world application of coding skills and, in the process, change technical hiring as we know it.
HackerRank is embracing AI and will pursue innovative ideas that imagine a future of programming in an AI-driven world.

What is ChatGPT?

On a basic level, ChatGPT is an example of a large language model. A large language model is a computer system trained on huge data sets and built with a high number of parameters. This extends the system’s text capabilities beyond traditional AI and enables it to respond to prompts with minimal or no training data.

The goal of ChatGPT’s developer, OpenAI, was to create a machine learning system which can carry a natural conversation. In practice, ChatGPT functions like a search engine or content creation system, synthesizing billions of data points into custom responses.

Developing a Smart Conversational Agent

The development of ChatGPT incorporated two innovative approaches:

ChatGPT is powered by the well-known ML model GPT3.5. The model is trained to complete the next few words of an incomplete sentence. The main idea behind this model is that, after training against billions of data points, the model starts to understand enough about the human world to complete sentences.
ChatGPT uses a human-in-the-loop system to continuously improve and answer questions in a more human-like fashion. OpenAI hired thousands of contractors to write human-like responses to challenging prompts as a way to continuously improve the model. Training the model to answer difficult questions improved ChatGPT’s responses at a remarkable rate.

Now that the training process is complete, users can run ChatGPT on accessible devices. This trait makes it superior to other models like AlphaCode, which are thought to be prohibitively expensive to run even after training is complete.

What Are the Strengths of ChatGPT?

Using the process above, OpenAI trained ChatGPT on almost all human knowledge. This enables ChatGPT to:

Create never before seen sentences and code. Because it’s seen billions of sentences and lines of code, ChatGPT can synthesize the information it has seen and form answers to questions that can be perceived as novel. However, there’s no guarantee that this code will be correct or optimal.

Combine ideas that it has seen separately but never in combination. For example, ChatGPT can write an answer to a coding question in the writing style of a specific author.
Exhibit a breadth of information. ChatGPT is trained on so much data that it has seen examples of most common situations and their potential variations. This enables it to give specific answers to niche questions or generalized answers based on more specific data.

What Are the Limitations of ChatGPT?

While ChatGPT outputs human-like sentences, and it’s easy to mistake its output as being powered by true intelligence, ChatGPT does have shortcomings.

In describing the tool’s limitations, OpenAI explained that ChatGPT may occasionally “generate incorrect information” or “produce harmful instructions or biased content.” Industry publications have described ChatGPT as confidently wrong, exhibiting a tone of confidence in its answers, regardless of whether those answers are accurate.

ChatGPT lacks the ability to fact-check itself or conduct logical reasoning. It often incorrectly answers questions and can be tricked relatively easily. Technologists have also noted its propensity to “hallucinate,” a term used to describe when an AI gives a confident response that is not justified by training data.

How ChatGPT Impacts Assessment Content

As a coding tool, ChatGPT excels at certain types of technical problems—but also has its limitations. A strong content strategy will be necessary to test your current coding challenges and prioritize the questions, and question types, that are less susceptible to AI coding support.

ChatGPT has probably seen almost all known algorithms. But ChatGPT isn’t just able to answer these algorithm questions correctly. It’s also able to write new implementations of those algorithms, answer freeform questions, and explain its work.

As a result, ChatGPT can answer the following question types with reasonable accuracy:

Well-known algorithms: It’s safe to assume that ChatGPT has seen and is able to answer all publicly available coding problems on platforms such as LeetCode and StackOverflow. If the algorithm appears in online forums or practice websites, ChatGPT will likely answer it correctly.

Minor variations of problems. ChatGPT does well on variations that tend to add to the solution rather than change it in any substantial way. The system can, for example, easily reverse the order of an array of numbers.
Multiple choice questions. When presented with a question and multiple potential answers, ChatGPT can usually identify the correct answer.

For hiring teams who administer coding challenges, that doesn’t mean you should necessarily avoid all questions that ChatGPT can solve. With the right protections in place, even questions solvable by AI can still be reliable. The key is to avoid questions that have answers so short that a plagiarism detection system can’t detect when a candidate has used a tool like ChatGPT. Even so, we are evolving our library with new types of content specifically designed with AI code assistance tools in mind.

Taking all of this into account, there are some actions you can take today to limit your hiring content’s exposure to the risk of plagiarism, including:

Avoid easily solved multiple choice questions
Avoid simple prompts to solve for common or widely available algorithm variants
Remove questions that require only a few lines of code to solve
Use proctoring tools and plagiarism detection systems
Combine coding tests with virtual interviewing tools to add empirical data to the hiring process

Ensuring Assessment and Hiring Integrity

In a world where humans and machines alike can write code, the ability to detect the use of AI-coding tools is invaluable. As such, employers increasingly turn to strategies and technologies that enable them to uphold the integrity of their technical assessments.

Assessment integrity has two core pillars: proctoring tools and plagiarism detection.

Proctoring Tools

One important component of ensuring assessment integrity is to build systems that provide the right proctoring capabilities.

Proctoring is the process of capturing behavioral signals from a coding test, and its purpose is twofold. First, proctoring tools record data points that support plagiarism detection. Second, proctoring tools also act as a deterrent against plagiarism, as candidates who know that proctoring is in place are less likely to engage in such activity.

The key behavioral signals that proctoring tools often record include:

Tab proctoring. Monitors if the candidate switches between tabs.
Copy-paste tracking. Tracks if a candidate pastes copied code in the assessment.
Image proctoring. Captures and records periodic snapshots of the candidate.
Image Analysis. Analyzes webcam photos for suspicious activity.

Plagiarism Detection

In addition to proctoring tools, the integrity of an assessment also relies on plagiarism detection. In other words, the ability to flag when a candidate likely received outside help.

The current industry standard for plagiarism detection relies heavily on MOSS code similarity. Not only can this approach often lead to higher false positives rates, but it also unreliably detects plagiarism originating from conversational agents like ChatGPT. That’s because ChatGPT can produce somewhat original code, which can circumvent similarity tests.

While the launch of ChatGPT caught many by surprise, the rise of LLMs has been a popular topic in technical communities for some time. Anticipating the need for new tools to ensure assessment integrity, HackerRank developed a state-of-the-art plagiarism detection system that combines proctoring signals and code analysis.

Using machine learning to characterize certain coding patterns, our algorithm checks for plagiarism based on a number of signals. Our model also uses self-learning to analyze past data points and continuously improve its confidence levels.

The result is a brand new ML-based detection system that is three times more accurate at detecting plagiarism than traditional code similarity approaches—and can detect the use of external tools such as ChatGPT.

Embracing Artificial Intelligence

As exciting as the launch of ChatGPT has been, LLMs with its capabilities are only the beginning. While it’s hard to predict the future, one thing is certain: AI technology is in a nascent state and will continue to grow at a rapid rate.

In the short term, the key to evolving your hiring strategy hinges on a renewed focus on content innovation and assessment integrity. By combining a strong question strategy with advanced proctoring and plagiarism detection, hiring teams can protect their assessment integrity and hire great candidates.

In the long term, we anticipate that artificial intelligence will redefine developer skills and, in the process, change technical hiring as we know it.

At HackerRank, our mission is to accelerate the world’s innovation. As such, we welcome this new wave of technological transformation and will pursue innovative ideas that imagine a future of programming in an AI-driven world.

Frequently Asked Questions

Can Your Plagiarism Detection System Detect Code From ChatGPT?

When Will the Plagiarism Detection System Be Available?

The new plagiarism system is currently in limited availability, with plans for general availability in early 2023. If you would like to participate in our limited availability release, please let your HackerRank customer success manager know and we would be happy to enable you.

Can You Validate if My Coding Questions Are Easily Solved by ChatGPT and Provide Replacement Options?

If you would like assistance in verifying how ChatGPT responds to your custom coding questions, we can run a report and provide content recommendations based on the results. Please contact our HackerRank Support Team, who would be happy to help.

Should I Avoid All Questions That ChatGPT Can Solve?

No. HackerRank’s proctoring tools and plagiarism detection system can protect even solvable questions. Instead, avoid multiple choice questions and problems with very easy or short answers.

I Still Have Questions About ChatGPT. Who Should I Contact?

If you’re a customer looking for support on plagiarism and its impact on your business, you can contact your customer success manager or our team at support@hackerrank.com.

The post What Is ChatGPT? And What Does It Mean for Technical Hiring? appeared first on HackerRank Blog.