Third Party Data Scraping from SeatSpy
For over a year, we have been experiencing bots accessing and apparently scraping data from our site, and have had to expend significant time and expense battling to attempt to stop this. It is not only contrary to our Terms and Conditions, but costs us money in cloud computing and development costs, and has previously adversely affected our users' experience.
The Operation
Last week, we saw this happening again and decided to set up a "sting" operation to catch the scraper. Rather than make any direct accusation, below we set out the facts and encourage the reader to explore the links and come to their own conclusion.
- First we chose a route and cabin class that has not been searched for many weeks and for which no users have alerts set up, so as to not affect our current users. The route we chose was Southampton to Nice in business class
- We stopped collecting realtime data on this route/cabin.
- We then manually changed the data such that when the calendar results of a search on that route/cabin were displayed, the word "SEATSPY" would appear for both the outbound and return legs. If you conduct a search on seatspy.com for SOU-NCE in business (as of 14th August 2021), you will see the result (choose only the Business class cabin for the clearest view). A full image is included at the end of this article, but as an example, here is the "S" from the return leg:
- We then found that over a period of 24 hours or so, this text was gradually populated and displayed in the results on rewardflightfinder.com - if you go to that site and search for SOU-NCE in business, you will see the result. A full image is at the end of this article, but as an example, here is the same "S" now shown on the rewardflightfinder site (again, choose only Business for the clearest view):
- The following link will take you directly to these search results (uncheck Economy for this view) - have a look right now, as this may not be the case for long, once rewardflightfinder read this article!
https://rewardflightfinder.com/calendar?airlineSelected=BA_blue&airlineMembership=blue&aCode=BA&numberOfPassengers=1&tclass=Business&tValue=business&membership=blue&jType=return&dPlace=Southampton%20(SOU)&dId=SOU&aPlace=Nice%20(NCE)&aId=NCE&economy=false&premium=false&business=true&first=false
The Impact
The scraping of data from our site is specifically prohibited by our Terms and Conditions, a copy of which we sent to rewardflightfinder.com in July 2020.
The scraping of data has had a material effect on our users and has been time consuming and costly to mitigate:
- We had to temporarily implement a requirement that users were registered in order to view search results, meaning inconvenience for our users having to log in every time
- This requirement resulted in the scraper creating hundreds of fake accounts on our site which affected our email reputation as emails to these addresses then bounced
- Our site performance has been degraded, and we had to pay extra cloud computing costs because of the huge amount of additional traffic generated by the scraping activity
- We have had to devote a great deal of development effort to attempt to stop the scraping, resulting in unexpected costs and a delay to our roadmap of new features
Frequency of Data Collection
It is worth noting that anyone scraping our data and using it to provide the same service is by definition providing a poorer quality service, with their alerts always going out later than our own. If they also only scrape data from us once every 24 hours, those alerts may be delayed by up to that period compared to our own, and the seats may be long gone by then.
As a comparison, at SeatSpy, we collect BA data at least once per hour (as at 14th August, our average time between collections for the past week is only 47 minutes)
There is a significant cost to our collecting data so regularly which is not borne by any competitor who is simply scraping our site to get theirs.
Screen Grabs
We encourage you to check out the websites and see the results for yourselves, but for convenience, we have included screen grabs of the results on seatspy.com and on rewardflightfinder.com respectively as at 16th August 2021.
The full resolution versions are available here:
- https://seatspywebsiteassets.blob.core.windows.net/blog/SeatSpy%20data%20being%20copied%20by%20competitors%20example%203.png
- https://seatspywebsiteassets.blob.core.windows.net/blog/SeatSpy%20data%20being%20copied%20by%20competitors%20example%204.png