Machine Learning Scam

Machine Learning Scam Targets Google Search Results

Google Algorithms Targeted for Manipulating Search Results

The world wide web has adjusted its working to a lot of fancy terms like Artificial Intelligence, Machine Learning, Predictions etc. However, underlying to these emerging trends is a simple computer mechanism: algorithm. Your search results, your preference settings, the way your website activity is ‘remembered’ all boils down to algorithms fed to AI. Marketers use this knowledge to the best of their ability to boost their organization’s reach. But the flip side of the coin is inevitable; this mechanism has been misused by cybercriminals to meet their own twisted ends. A recent machine learning scam has made the news, which is targeting Google Search Results.

Before we dive into the incident, let us first understand a little bit about the basics of machine learning and its effect on Google Search Results.

1.1      What is Machine Learning?

Machine learning is a branch of AI that supplements ‘intelligent’ systems with the power to learn and improve by themselves. Once fed with test data, machines automatically start picking up on information and making sense of it.

The learning journey begins with sample data, such as examples, scenarios, or instruction sets, with the end goal being exposing hidden patterns and trends that could help the machine make better decisions in the future.

 

1.2      How does ML factor in Search Results?

Google has officially declared itself a machine learning-first company. It (and other search engines) makes use of ML in one or more of the following ways:

1.2.1     Pattern Recognition

Search engines are using machine learning for pattern detections for plagiarism checks and potentially harmful spam traffic.

1.2.2     Optimizing Search Results

With time, newer and newer ranking signals are coming into the picture which widely impact the way organizations should go about Search Engine Optimization. Ranking signals and parameters are exposed and refined by feeding intelligent systems with current and past data of website performance.

1.2.3     Querying on Keywords

Each keyword for SEO comes with a weightage based on how it ranks over the Internet as a whole. As such, keyword efficiency is measured with a parameter known as CTR (Click Through Rate). Search engines can ‘learn’ about the CTR performances and tailor your search results accordingly.

1.2.4     Proximity Ranking

Consider a cybersecurity website such as Logix. The word ‘cybersecurity’ is an obvious keyword. But what about ‘cyber-security’? Or ‘cybersec’ as it is sometimes abbreviated? Without ML, these similar keywords would have to be explicitly focused on for SEO, but now the engine can pick up on the proximity to the original keyword.

There are several other Machine Learning techniques which can impact search results. However the above ones were predominantly used in the incident we are about to study.

 

1.3      ML and Cyber Crime

Spammers are misusing Google Rich Results algorithms to get their own content to rank amongst the top results returned by the search engine. Being aware of the tech that goes behind query result optimization, the criminals have come up with a way to automatically create video content from web pages and vice versa. The same method can be used to revamp text content from podcasts and podcast content from web page.

1.3.1     Text to Audio

Google ranks newsworthy videos within the Top 3 search results. When a news topic is current and happening, Google promotes relevant videos to improve its own relevance and quality.

However, Google does not seem to check if the audio content is authentic or simply a restructured audio version of a pre-existing text content. This can easily be done by passing a content body through a ‘Text to Audio’ software to create a multimedia file. For video, spammers just pick up the featured image of your content.

1.3.2     The Reverse Process

Podcasts can also be converted to text; content theft being the objective. Since the format is a different one than what the original podcast authors published, legalities around plagiarism can be avoided. There are many ways to do this trickery, including Google’s own free Gboard app, available on Google Play Store.

Gboard is powered with a transcription function. All you have to do is open up a text or note app then click the microphone while the podcast is running.

 

If you’re wondering, yes Google is aware of these scams. The impact of ML on Search Results is being discussed in SEO forums as of now.

It will be interesting to observe how Google and other algorithm-centric giants like Facebook and YouTube react to this. Will they change their ways and allow their result performances to take a hit? Or will they come up with a new anti-spam feature to block out these types of fraudulent content? Remains to be seen.

Leave a Reply

Your email address will not be published. Required fields are marked *