Illustration by Alysheia Shaw-Dansby

How to Tackle Fraudulent Survey Responses

Data@Urban
7 min readSep 28, 2023

--

In 2022, the Upward Mobility research team fielded an online community survey to build a demographically representative sample of 10 US cities for the Upward Mobility research project. To boost response rates, we advertised through social media and offered a $5 gift card incentive for completing the survey. Unfortunately, these strategies opened the door to an unintended challenge: fraudulent responses.

Fielding an open-invitation survey always presents the challenge of identifying and preventing fraudulent responses, and through our survey process, we took many steps to prevent inauthentic responses. Here we outline some of those steps and their pros and cons.

Identifying the problem

Our survey was particularly susceptible to fraudulent responses because we advertised that a $5 Amazon gift card would be provided to those who completed it. We explained that the gift card would be delivered electronically to the email address provided at the completion of the survey. Initially, the gift card incentive was prominent in the advertisements as we believed it would help improve the click-through rate, but we worried that perhaps $5 was too small an incentive.

We purchased advertisements on Meta platforms (Facebook and Instagram) because the Urban Institute already has advertising accounts and because we believed those platforms would maximize our chances of reaching a demographically representative population. After launching our survey and collecting responses for 24 hours, we conducted an audit and discovered some concerning results.

From the response metadata, we found that a substantial amount of traffic was coming from outside the US despite our use of Meta advertising tools to specifically target the 10 US cities. (Some individuals attempted to complete the survey multiple times and earn much more than $5.) By looking at metadata such as device type, browser type, screen resolution, and IP address, we also found that over 90 percent of initial responses were not legitimate. Catching these fraudulent responses early allowed us to avoid giving out gift cards, but we needed to devise a way to prevent future fraudulent responses.

Correcting the problem

In response to these issues, we implemented several safeguards, each with pros and cons, using Qualtrics survey software to prevent future fraudulent responses.

reCAPTCHA

The first and easiest step we took was to add reCAPTCHA, which is designed to determine whether a user is a human or a bot, at the beginning of our survey. In Qualtrics, reCAPTCHA displays as a special question type called “Captcha verification,” which we added as the first survey question. Based on our initial audit, we suspected that many if not most illegitimate responses were entered by humans, not bots or automated scripts, meaning that adding reCAPTCHA did no harm but did little to solve our problem.

Fraud detection

Qualtrics software comes with built-in fraud-detection features. Every survey response is assigned a fraud score, which data analysts can use when deciding whether to include or exclude it from reporting and analysis. In our case, because individuals were attempting to exploit our survey to receive the incentive, we wanted to block potentially fraudulent responses from entering the survey at all. We programmed the Q_RelevantIDFraudScore embedded data element in the branch logic of our survey and prevented any response that had a Q_RelevantIDFraudScore value greater than or equal to 30 (on a 0 to 130 scale).

But this process is not entirely transparent. Qualtrics uses proprietary third-party tools to calculate Q_RelevantIDFraudScore, so it’s not entirely clear how that process works. It’s possible that a legitimate survey respondent was blocked because for whatever reason their fraud score was too high.

During our audit, we found that the same IP address was sometimes used for several or even dozens of survey completions in a short period. When we combined this information with other pieces of metadata (like the operating system and screen resolution of the devices used to respond), we concluded that these responses all came from one person or location. In response, we used Qualtrics’ fraud-detection tools to prevent more than one response from any one IP address. We added the Q_RelevantIDDuplicate embedded data element in the branch logic in Qualtrics. If Q_RelevantIDDuplicate was equal to 1 (a likely duplicate), the survey terminated before the respondent could enter any more data. Unfortunately, this strategy blocked duplicate responses unconditionally, including situations where a duplicate ID might be legitimate, such as situations involving students in the same college dorm.

Geofencing

After discovering the significant number of responses from outside the US, we quickly moved to block international traffic. We did so by programming branch logic that used GeoIP location set to the US.

Additional audits revealed that many fraudulent responses were coming from inside the US but outside the target cities. Though it’s possible someone could live in a target city but take the survey while traveling, we found many individual IP addresses belonged to virtual private network (VPN) providers, which are designed to hide the location of the user. In response, we implemented a stricter geofence that prevented any responses originating from outside the target cities. We used the same branch logic and GeoIP location, except instead of setting it to the entire US, we added the list of zip codes corresponding to the target cities.

By geofencing in this way, we potentially blocked some target respondents from accessing the survey, including those who lived in the target cities but were trying to take their surveys from elsewhere (for example, people on vacation). We could have even blocked people who live in the target cities but commute to neighboring cities or counties. To rectify this problem, we could have added additional bordering zip codes to include commuters or those who might travel outside of the immediate target cities, but we chose not to take this extra step as zip codes can be quite large and this might introduce additional problems that we would then need to resolve.

URL referral requirements

We advertised our survey through Facebook and Instagram and expected nearly all traffic to originate from those two platforms. During our first audit, however, we noticed that very few responses originated on either. Instead, many responses came from email domains or messaging apps like WhatsApp and Telegram. We suspected that one or more individuals discovered our ad and shared the link via email or messaging apps.

To prevent this type of abuse, we programmed our survey to block anyone who clicked into the survey from a referring URL other than facebook.com or instagram.com. We did so by adding the embedded data element Referer to the Qualtrics survey flow, which worked for the web and the mobile app versions of these websites.

The downside to this approach was that it only allowed someone to take the survey if they first saw our ad. An eligible respondent who received the link from a family member or friend would have been prevented from taking the survey. Similarly, someone who saw the ad and saved the link for later would have been blocked when they tried to take the survey.

Attention checks

Many surveys include a question designed to make sure the respondent is paying attention. We used a question that included the text “we would like you to select option B for this question.” Depending on the circumstances, survey analysts might choose to exclude respondents from analysis who chose any answer other than the one requested, assuming they weren’t honestly reading and answering questions. The attention check question was included from the beginning. The original plan was to use it for quality assurance, however after our audit we amended the survey flow so that anyone who didn’t answer this question correctly was stopped before they could reach the contact information screen where they enter their information for the incentive.

Manually approve incentive payments

In addition to all of these safeguards, we manually reviewed each response before sending an incentive. This meant that instead of having gift cards automatically delivered to respondents after completing the survey, we reviewed their response before approving an incentive. In these manual reviews, we considered whether the email address provided looked valid and how quickly the respondent completed the survey. We defined valid email addresses as those from well-known providers like Gmail or Outlook, local universities, or internet service providers. We considered email addresses that came from other countries or websites known to provide “burner” email addresses as not valid. The average time to complete the survey was a little over three minutes, so respondents who completed in under one minute were given additional scrutiny.

Final audit

After implementing all of these measures, we ended with confidence that nearly all responses were legitimate. It’s impossible to know with complete certainty, but the responses that made it past these measures all appeared authentic. The survey had an initial goal of 1,000 demographically representative responses in each city, but no city reached 1,000 responses and we don’t have sufficient information to know how many authentic responses weren’t recorded because of these strict measures.

With these safeguards, we made the choice to prioritize authentic responses over quantity. Moving forward, we may not always need all of these safeguards depending on the nature of the project and the perceived prevalence of fraudulent responses. Ultimately, fraudulent responses will continue to pose a challenge for survey projects that offer incentives and allow responses from the general public, but we hope the strategies outlined above can help alleviate the burden.

-Rob Pitingolo

Want to learn more? Sign up for the Data@Urban newsletter.

--

--

Data@Urban

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.