For starters, what is Protect?
Protect is our long-standing content protection product. It was born out of Andy’s (Andy Chatterley, MUSO CEO and Co-Founder) need to protect his music against piracy about ten years ago. Back then, there was little support available for dealing with piracy, and DMCA was still in its infancy. So we built Protect to search the web for infringing files, and automatically send notices to get content removed.
And what does “data-driven” Protect mean?
As time passed the piracy landscape has moved on. We realised that we were falling into the trap of playing whack-a-mole with the internet, where it’s pretty much impossible to remove every variant of a popular file. We ended up competing with other vendors to send the most takedowns for the “best coverage” - but by looking at the data we realised this was the wrong approach for our customers.
We’ve catalogued hundreds of thousands of piracy domains. We looked at their traffic and observed a long tail distribution - meaning roughly that the top 10% sites were accounting for 90% of the audience. So we decided to target the 10% - quality over quantity.
Ok, so what makes a piracy site rise to the top? Are there any themes?
Yeah, absolutely. These sites are typically the most professional and the most profitable - they have the best selection of high quality and high availability content for users to choose from. They are also very good at resiliency - they will often cycle through lots of domains to avoid ISP blocking, and they have complex hosting setups. Sometimes they look so good, you question whether they might actually be legal!
And what did this new approach mean for MUSO’s engineering teams?
The resiliency and complexity presented a lot of issues for our crawler technology. One good example is the use of CAPTCHA - usually a checkbox, or a little puzzle which confirms you’re a human. Any site can easily implement them, and they are often available at no cost - so they’re an effective technique for piracy sites to remain below the crawler’s radar.
To get round this tactic, we partnered with a specialist third party service to develop a way to have a human complete the necessary puzzles as they were presented, then allowing the crawler to access the page and look for pirated content.
That’s really interesting! Where has MUSO had the biggest breakthroughs?
Some of our biggest breakthroughs came by looking outside of improvements to the crawler. We figured these sites are popular in part because they are so effective at blocking automated attempts to remove content.
So, we looked for other ways to gain access and found a way to use real piracy users and their journeys across the web as a source for finding infringing content. When a user successfully reached the content they were looking for and consumed it, we recognised that behaviour and subsequently removed the content so no one else could get to it.
It takes a user consuming the content for us to identify it, but it meant we were getting access to lots of top-ranking domains and without the cost and complexity of using the crawler.
And where have you found challenges? What have you learned?
The thing we’ve learned in developing all these circumvention techniques is that technology has its limitations. We still need a degree of human input and intelligence - whether that’s in resolving the CAPTCHA puzzles, or identifying and cataloguing the piracy sites themselves.
Our approach now is to take a blend of both human intelligence and the power of automation, applying a shared focus to the top ranking sites to offer a better service to our customers.
And finally, how do you think the piracy web will change in the coming years?
The general consensus is that the ever-increasing quantity of streaming and video subscription platforms is going to drive up piracy as consumers get tired of paying for yet another service. I also think the nature of the piracy audience is changing because piracy doesn’t carry the same stigma it once did.