Traditional or basic SEO only addresses some of the following points without going deeply into them. However, advanced SEO reaches levels of compression and deeper investigation of search engines.
Below is a list of organic classification signs that the Google search engine has patented.
The possible signs of classification have been much discussed in SEO during the last years, and although we will probably never know 100% what they are all, at least if we can corroborate some.
There is an important difference between the classification signals that Google includes and those that only considered.
Sometimes I see that the two terms are confused, and leads to “suppose” false or untested classification signals.
I know the subject well, and I have read hundreds of articles talking about whether google includes some classification or other signals.
The difference between these articles and this, is that the signals in this list are documented and published by Google itself.
What are the organic search ranking signs
Organic classification signals are those that include specific niches, leaving out news results, local searches, videos or image searches. These types of results are very likely to work with different classification signals for search engines. It is also possible to see a combination of organic search results with non-organic search results due to Google’s universal search. For example, when a search is made for some generic terms in Google, we can obtain a mixture of results of different types, such as:
- lists of books
This means that there may be different types of classification signals that are being used for any query.
Next I will present a list of organic classification signs that are included in google patents.
1. Age of the domain and linkage rate
This Google patent specifies how the search engine uses a first crawl of the website or through the link that was discovered, such as the birth of that website.
Here we must add about the domains and their antiquity the following.
At first, Google stated that they could consider the renewal of a domain as a signal of classification, when the domain is acquired for 5 or 10 years in order to avoid spammers.
However, this claim is not valid because there are millions of web pages that are renewed year after year and are not spammers.
The patent also talks about the links and their response to classify a web page. We must bear in mind:
- The date on which the search engine indexes the page (document).
- The date on which the first link (backlink) to that page is detected.
- The number of links that page receives.
- The average rate of links that page receives calculated in a certain time.
2. Use of keywords within a page
In this Google document, they explain to us how Google finds the words that contain those search terms used by users as an important part of organic results.
We must still continue to implement the specific keywords on a page within their different meta tags.
However, it is not necessary to include that keyword a number of times or percentage of times.
What we must do is implement the keywords that users use to find that page along with contextual terms that give value to the content.
3. Related phrases within a page or document
In this patent, Google states that pages with more related phrases in them have a higher rating than pages with fewer related phrases.
Google can see the queries for which a page is optimized, and look for the words with the highest ranking for those query terms, and see which significant complete sentences occur most frequently (or co-occur ) on those highly ranked pages.
This patent says among many other things:
The documents that contain the phrases most related to the Q query phrases will have the highest value related phrase bit vectors , and these documents will be the highest ranked documents in the search results.
You can use some of these related phrases as anchor texts to help classify other pages.
But it also says about duplicities of documents for the same query:
For example, a news article produced by a news agency ( Press release ) can be replicated on a dozen or more individual newspaper websites.
The inclusion of all these duplicate documents in response to a search query only loads the user with redundant information and does not respond in a useful way to the query.
Therefore, the presentation system 130 provides an additional capability 704 for identifying documents that are likely to be duplicates or nearly duplicates with each other, and only include one of these in the search results.
This makes it clear that the creation of content on the same subject already created, if it is redundant or there is not a big difference, google will not show it .
What do you think now about sending massive press releases?
4. Keywords in headings, lists and titles
Google defines keywords in titles and lists as ” semantic proximity to the page “.
For SEO purposes it may be interesting to know what relationship there is between a page header with the keyword and its relevance to the search engines.
This Google patent shows how the search engine is trying to locate and understand the visual structures on a page that could be semantically significant, such as a list of elements associated with a header.
This patent also tells us about the keywords in the headings:
A basic technique for classifying search engine results is based on the degree to which the search query matches the results.
For example, documents that contain all the terms of the search query or that contain several occurrences of the terms in the search query can be considered more relevant than other documents and, therefore, the search engine can classify them more widely.
Other factors can also be considered, such as the proximity of the terms (also known as distance between the terms) in the document.
The proximity of the terms in this context can be measured simply by counting the number of words in the document that appear between the search terms.
In documents such as web pages, however, which may contain information of complex format, the “proximity” of the terms in the underlying HTML file may not correlate with the “closeness” of the terms when the document is visually displayed.
As a result, the performance of search engines that classify documents according to the proximity of the search terms in the underlying documents may be affected.
For headers and titles , a term (keyword) in the title of a document can be considered close to any other term in the document, regardless of the word count between the terms.
Similarly, a term (keyword) that appears in a header (H2, H3) can be considered very close to other terms that are below the heading in the tree structure.
It is likely that Google used the distance between different words within a page as a signal of classification and how relevant that page could be for those keywords if they appeared within a query.
And this concept of semantic closeness within structures such as headings and lists can help shed light on web optimization.
The grouping of keywords according to the intention of the user can make sense.
5. Speed of the page
Google has announced on several occasions that it uses Page Speed as a classification signal.
His last effort when integrating lighthouse is another proof.
In this patent on Page Speed, Google mentions things like:
Given two resources that are of similar relevance to a search query, a typical user may prefer to visit the resource that has the least load time.
The loading time of a resource may depend on the amount of content included in the resource. A resource that includes multiple embedded videos may have a longer loading time than a resource that does not include embedded images or videos.
In addition, a resource hosted on a web server in United States can load faster on a user device in United States than on a user device in the United Kingdom.
Improving the loading speed of the website is something we should not overlook. This will include file compression, CSS minification, server improvements, optimized SEO themes and more.
6. One page viewing time
This Google patent tells us that a website can rank the pages higher if they look longer than other web pages.
A system that uses display times to rank search results can also provide content providers with greater use of the site and user participation by promoting content that has longer observation times.
In general, “watch time” refers to the total time a user spends watching a video.
However, viewing times can also be calculated and used to classify other types of content according to the amount of time a user spends viewing the content, for example, the amount of time spent watching a video , viewing a particular web page or listening to an audio file ( podcast ).
Introducing video or audio files within the body of the content of a web page is something that you should already be doing.
7. Context terms within a page
This google patent talks about the different meanings that the same word can have and how google can interpret it.
A search engine must be able to know when a query made through a word has one meaning or another.
This automatic learning is acquired by Google through Knowledge Graph
You must make sure that the search engine understands the context of the keywords entered in the content so that it is archived within the correct group of entities in order to offer it to the correct users.
A search engine can respond to a user’s query:
- giving contexts , both macro and micro-contexts based only on the query.
- with other queries from the same user.
- with the query associated with other information.
- with the results of that user’s query.
- with other inputs provided on the user to give context.
8. Text quality signal according to the Ngram language model
In this google patent, the search engine can interpret quality signals from web pages according to the language models created from those pages when looking at the ngrams on the pages of a site.
What are the ngram
The n -gram models are now widely used in probability, communication theory, computational linguistics (for example, natural language statistical processing), computational biology (for example, biological sequence analysis) and data compression.
The Ngram Viewer was initially based on the 2009 edition of Google Books Ngram Corpus.
In the end it is about determining the quality of the content of a web page.
For example, if a new website is able to create content that is of the same quality level as another that is already well positioned, the first one can obtain a higher quality score than other new websites.
An example of writing with a high degree of contextual and syntax can be found when we talk about “web positioning”.
The “street users” use the word or phrase ” web positioning ” to find content related to this sector, however, within a document that deals with web positioning, an expert in SEO writing can include terms such as “organic ranking” of search engines “, or even” optimization of a web page for search engines.
Writing high quality content remains a priority to obtain good points from the search engine, and this also requires a good linguistic base.
9. Content gibberish prejudices the classification
Google has been performing ngram analysis on many books and documents for years, and is able to know if the content that is created through the language is quality or gibberish .
Through this learning, the search engine is able to detect low quality content and prevent that website from ranking high.
This google patent , offers information about gibberish content.
10. Results of authority pages
In this patent , we can find relevant information on the classification of pages with authorityfor certain types of queries in which it is possible that there are no web pages with sufficient authority or reliability by the search engine.
The system may include an authorized search result ( web page with authority ), for example, when the scores of a first set of initial search results are low or when the query itself indicates that the user is seeking resources from an authorized site.
We can see examples quickly about the inclusion of authority pages when we perform a local query, such as [ local business] in [city] “or similar.
Here we will find how google always includes an authority page in the results, such as yellow pages or similar, because in many cases the pages of that sector do not have online authority.
To solve this problem it is convenient to work the EAT (experience, authority and reliability of the contents according to the google quality guidelines).
11. Penalty for suspicious activity to increase the classifications
This google patent , talks about the possible penalty to websites that may incur suspicious actions in order to deceive the search engine.
The World Wide Web (“web”) contains a large amount of information that changes constantly.
Spam techniques that modify the range, such as index and link spam, include a set of techniques by which information providers try to trick a search engine into classifying its information (or customer information) into or near the top of the search results list.
Some of the techniques used by spammers that modify the range include keyword stuffing , invisible text, lowercase text, page redirects, META tag filler, and link- based manipulation .
It also explains more in depth the subject of the links:
Link- based manipulation may include the creation or manipulation of a first document or a set of first documents to include a link or a series of links to a second document in an attempt to increase the range of the second document.
Some existing search engines determine the rank of a document based on the number or quality of the links that point to the document.
A link farm is an example of a link-based manipulation technique.
You already know that trying to upload the ranking only based on links, carries a risk.
12. Popularity scores for event pages
We all sound like Meetup pages , right?
Is it possible that these types of pages rank higher if they include very popular or relevant events for the audience?
This google patent , deals with this classification signal for event pages.
A search engine can identify the most popular events for a type of query and place them in a high position to satisfy the search of the users.
13. The PageRank of a link is based on the probability that someone clicks on it
In this patent, we can check how a “weight range” is established for the probability of clicking that a link can receive, the anchor text and the words before and after a link.
For example, the model generation unit 410 can generate a rule indicating that a link located under the heading “More featured stories” on the website cnn.com has a high probability of being selected.
In addition, the model generation unit 410 can generate a rule indicating that a link associated with a destination URL containing the word “domainpark” has a low probability of being selected.
Weights can be generated for links based on the model (act 540). The weight of a link can be a function of the rules applicable to feature data associated with the link.
The weight of a link can reflect the probability that the link will be selected.
The user behavior data is associated with a subset or class of users. In this case, the weights assigned to the links can be adapted to the user class.
As we see in this patent, google can assign a score to each of the links within the same page.
Depending on where the link is and the probability that the user clicks on it, including the anchor and phrase text, it will determine a positive or negative score.
Maybe placing links in footers is not of great value for the user or google.
14. The CTR and the classification signal
Much has been written in SEO about the CTR and if it is a more or less strong classification signal.
There is still much confusion in this regard, since this patent states that “users who conduct searches are often the best judges of relevance, so if they select a particular search result, it is likely to be relevant, or at least more relevant than the alternatives. ”
Some study states that it takes more than 100 clicks (probably 500) to make a difference.
However, Gary Illies himself said on Twitter that “the clicks are too noisy” to take them into account in the rankings.
Although the controversy is served, I believe that the sum of the number of clicks and the time of stay or browsing the page of the users, may be relevant to obtain a better classification.
15. Website quality levels
If there is a greater number of reference queries than associated queries, the quality level of a website is higher.
And a number of associated queries smaller than the reference number, can cause a lower ranking.
The reference queries are those that usually include the name of the brand, company or person.
This is what this patent explains .
If google gives you a low quality score, the ranking you get will be lower.
This specification describes how a system can determine a score for a site, for example, a website, as seen by a search engine, which represents a measure of quality for the site .
The site quality score for a particular site can be determined by calculating a proportion of a numerator that represents the user’s interest in the site as reflected in the user’s queries directed to the site and a denominator representing the user’s interest in the site. the resources found on the site as answers to all types of queries (generic).
Site quality score for a site can be used as a signal to classify resources, or to rank search results that identify resources.
The idea of obtaining a high quality score and that it shows in the organic search results is to answer questions and solve issues in your sector.
The greater the interest of your audience for the published content, the greater the number of individual searches (brand) and the higher score will google .
16. Names of equal people
We have all seen a case in which when you make a query about a person appear very similar and can get to confuse the search engines.
How does Google treat this disambiguation of people?
This patent talks about this aspect that occurs a lot on the web.
The different contexts of a person’s name usually refer to different people with the same name and / or refer to disjoint aspects of the same person.
Groups are used to generate the clusters from the lists of context terms. The data is stored in the name context data.
The grouping of resources according to an unambiguous name facilitates the extraction of data and other data processing techniques that can satisfy the informative needs of the users.
This means that a search engine can disambiguate people with the same name by grouping them into different groups, the key being here – the context.
If you are going to write about a person or name it within a web document of which there are several, you will have to introduce labels (or text around) that inform the search engine of who you are talking about.
17. Efficacy and affinity through social networks, and applications
This patent of Google analyzes different signals of classification of search engines that are a little different, as it can be the social networks or the applications (APPs).
Google can detect if we use an application continuously to search for a song, a place, etc., even if we are connected through a social network with them.
So if we make a query on that topic or looking for a new song for example, we will return results from that social network or application.
This is called affinity.
The patent also tells us that:
For example, when two web pages provide access to songs from the same album but one of the web pages provides additional details about the writing of the album or about how the musical group that wrote the album was formed, the web page with additional details can be classified higherthan the web page with less details.
Can this have anything to do with social SEO?
18. Appointments and opinions
This google patent indicates that they can identify who said what through various methods such as words where “you said”, “said”, two periods or quotes.
Systems and methods are provided to generate an order of classification of the quotes identified, where the ranking order is based on the quotation scores.
In addition, systems and methods are provided for transmitting information in order to display the selected appointments on a display device.
It is interesting to know that one of the first processes described in the patent is that which focuses on the identification and storage of citations and information associated with citations in a database.
The object entities may include, for example, an author, a person, a place, a subject, an element or thing, and / or an event, etc. associated with the query.
For example, the query “opinions about Mandela” may include the subject entity “Mandela”, which is a person.
The patent also shows how these different ways of qualifying relevance for a topic could be applied according to opinions:
The ratings of relevance for the entity in question “breaking badly” can be based on the content elements published from the moment of the release of the first episode of the television program “Breaking Bad” until one year after the release of the last episode.
Again here, using data markup may be a good idea.
This article on organic classification signs is not yet finished. I will update it shortly.
I think there is a lot of information backed by Google’s own patents, although it is also true that the search engine itself has indicated that it does not have to follow the steps of a patent.
Still, I believe that following the investigation of these documents can help to know more about SEO.