Once upon a time there were two nerds at Stanford working on their PhDs.
(Now that I think about it, there were probably a lot more than two nerds at Stanford.) Two of the nerds at Stanford were not satisfied with the current options for searching online, so they attempted to develop a better way.
Being long-time academics, they eventually decided to take the way academic papers were organized and apply that to webpages. A quick and fairly objective way to judge the quality of an academic paper is to see how many times other academic papers have cited it. This concept was easy to replicate online because the original purpose of the Internet was to share academic resources between universities.
The citations manifested themselves as hyperlinks once they went online. One of the nerds came up with an algorithm for calculating these values on a global scale, and they both lived happily ever after. Of course, these two nerds were Larry Page and Sergey Brin, the founders of Google, and the algorithm that Larry invented that day was what eventually became PageRank. Long story short, Google ended up becoming a big deal and now the two founders rent an airstrip from NASA so they have somewhere to land their private jets. (Think I am kidding? See http://www.netpaths.net/google-plane/.)
Relevance, Speed, and Scalability
Hypothetically , the most relevant search engine would have a team of experts on every subject in the entire world—a staff large enough to read, study , and evaluate every document published on the web so they could return the most accurate results for each query submitted by users.
The fastest search engine, on the other hand, would crawl a new URL the very second it’s published and introduce it into the general index immediately , available to appear in query results only seconds after it goes live. The challenge for Google and all other engines is to find the balance between those two scenarios: To combine rapid crawling and indexing with a relevance algorithm that can be instantly applied to new content. In other words, they ’re trying to build scalable relevance. With very few exceptions, Google is uninterested in hand-removing (or hand-promoting) specific content. Instead, its model is built around identifying characteristics in web content that indicate the content is especially relevant or irrelevant, so that content all across the web with those same characteristics can be similarly promoted or demoted. This book frequently discusses the benefits of content created with the user in mind.
To some hardcore SEOs, Google’s “think about the user” mantra is corny; they ’d much prefer to know a secret line of code or server technique that bypasses the intent of creating engaging content. While it may be corny , Google’s focus on creating relevant, user-focused content really is the key to its algorithm of scalable relevance. Google is constantly trying to find ways to reward content that truly answers users’ questions and ways to minimize or filter out content built for content’s sake. While this book discusses techniques for making your content visible and accessible to engines, remember that means talking about content constructed with users in mind, designed to be innovative, helpful, and to serve the query intent of human users. It might be corny, but it’s effective.
That fateful day, the Google Guys capitalized on the mysterious power of links. Although a webmaster can easily manipulate everything (word choice, keyword placement, internal links, and so on) on his or her own website, it is much more difficult to influence inbound links. This natural link profile acts as an extremely good metric for identifying legitimately popular pages.
NOTE Google’s PageRank was actually named after its creator, Larry Page. Originally , the algorithm was named BackRub after its emphasis on backlinks. Later, its name was changed to PageRank because of its connections to Larry Page’s last name and the ability for the algorithm to rank pages. Larry Page’s original paper on PageRank, “The Anatomy of a LargeScale Hypertextual Web Search Engine,” is still available online. If you are interested in reading it, it is available on Stanford’s website at
It is highly technical, and I have used it on more than one occasion as a sleep aid. It’s worth noting that the original PageRank as described in this paper is only a tiny part of Google’s modern-day search algorithm.
As modern search engines evolved, they started to take into account the link profile of both a given page and its domain. They found out that the relationship between these two indicators was itself a very useful metric for ranking webpages.
Text Is the Currency of the Internet
Relevancy is the measurement of the theoretical distance between two corresponding items with regards to relationship. Luckily for Google and Microsoft, modern-day computers are quite good at calculating this measurement for text. By my estimations, Google owns and operates well over a million servers. The electricity to power these servers is likely one of Google’s larger operating expenses. This energy limitation has helped shape modern search engines by putting text analysis at the forefront of search. Quite simply, it takes less computing power and is much simpler programmatically to determine relevancy between a text query and a text document than it is between a text query and an image or video file. This is the reason why text results are so much more prominent in search results than videos and images.
The search engines must use their analysis of content as their primary indication of relevancy for determining rankings for a given search query. For SEOs, this means the content on a given page is essential for manipulating—that is, earning—rankings. In the old days of AltaVista and other search engines, SEOs would just need to write “Jessica Simpson” hundreds times on the site to make it rank #1 for that query. What could be more relevant for the query “Jessica Simpson” than a page that says Jessica Simpson 100 times? (Clever SEOs will realize the answer is a page that says “Jessica Simpson” 101 times.) This metric, called keyword density, was quickly manipulated, and the search engines of the time diluted the power of this metric on rankings until it became almost useless. Similar dilution has happened to the keywords meta tag, some kinds of internal links, and H1 tags.
As search engines matured, they started identifying more metrics for determining rankings. One that stood out among the rest was link relevancy. The difference between link relevancy and link popularity (discussed in the previous section) is that link relevancy does not take into account the power of the link. Instead, it is a natural phenomenon that works when people link out to other content. Let me give you an example of how it works. Say I own a blog where I write about whiteboard markers.
(Yes, I did just look around my office for an example to use, and yes, there are actually people who blog about whiteboard markers. I checked.) Ever inclined to learn more about my passion for these magical writing utensils, I spend part of my day reading online what other people have to say about whiteboard markers. On my hypothetical online reading journey, I find an article about the psychological effects of marker color choice.
Excited, I go back to my website to blog about the article so (both of) my friends can read about it. Now here is the critical takeaway. When I write the blog post and link to the article, I get to choose the anchor text. I could choose something like “click here,” but more likely I choose something that it is relevant to the article. In this case I choose “psychological effects of marker color choice.” Someone else who links to the article might use the link anchor text “marker color choice and the effect on the brain.”