Search engines are an example of something that many people use every day, people use it to search for the best pizza near them, the news today, or even what could be the best remote desktop software, but Search Engines are not something that people necessarily understand. By typing a simple query into the search box, a search engine can return a vast amount of results within a few milliseconds. In this article, we will explore the technology that power modern search engines.
What is a Search Engine?
A search engine consists of two main components, a database of indexed links and files, and a series of algorithms that search for and return query results. Google’s search engine is the most widely used in the world; its database consists of many trillions of links to web pages and its algorithms takes hundreds of factors into account to calculate the most relevant search results.
How do Search Engines Work?
The search engine’s algorithms control three main functions, web crawling, indexing and ranking. Search engines send out web crawlers locate and review website content. This content is reviewed regularly for any new changes, and also to determine the type of data displayed on the websites. A scheduler maintains how often the websites will be reviewed for new updates. Important information from the websites is extracted and parsed, the scheduler determines, using a number of different factors, how often a website needs to be reviewed and crawled again.
Once a website has been reviewed or ‘crawled’ it is indexed on the search engine’s database. During the indexing process, the search engine will use various factors to determine how and where a particular web page may be displayed with regards to an end user’s search query.
It must be noted that not all search engines work in this way. Google, Bing and DuckDuckGo are examples of search engines that analyze websites and return search results based on their relevance to the user’s query. Other search engines such as Wikipedia, Amazon and YouTube will only show results based on a query without having indexed and analyzing the pages.
The search engine’s scheduler assesses the relative importance of new and already reviewed URLs. It then uses various algorithms to decide when to crawl new URLs and how often to re-crawl known URLs, depending on their perceived importance.
- Web Crawling
Web crawlers are automated software programs that search the internet for new website URLs to analyze the content contained on the pages.
The parser extracts links from the page, along with other key information such as content type. It then sends the reviewed URLs to the stored and the extracted data for indexing.
Indexing is defined as the process of where parsed information from crawled pages is added to an indexed database. The database can be compared to a vast digital library of many trillions of web pages.
How Does Google Work?
Google is the most widely used search engine, accounting for more than 90% of all search queries requested, and handling approximately 3.5 billion searches every day. Since Google is such a dominant force among search engines, we will focus specifically on how Google’s search engine is superior to others.
Google’s search engine works in the way that was previously explained, but it also includes other important features:
- Language Models
Google has invested much resources in language models to be able to interpret the user’s intentions and be able to correct spelling or grammar misinterpretations. For example, if the search query is “chinees restront near me”, Google is able to understand that the correct query should be “Chinese restaurants near me” and return the correct results.
Google’s search engine can also understand synonyms and different phrases which have the same meaning. For example, the search engine will understand that “how to lose weight” has the same meaning as “tips to become thin”.
- Ranking algorithms
Google uses many different algorithms to determine which indexed results are the most relevant to the search query.
How Do Google’s Ranking Algorithms Work?
Google is well known for the secrecy regarding the exact details of their search engine operations, and undoubtedly there are many complicated algorithms being used, but there are important factors that have been made known.
- Topical Relevance
When a web page matches the keywords of the search query, especially in prominent positions like headings, this is a sign that the web page is most likely relevant to the query. While this generally is a good way to check for a web page’s relevance, Google also checks for other keywords that are commonly linked to the search query’s keywords.
For example if your search query was “how to pass a driver’s license test”, the keywords that would be checked would be not only the search query, but also words like “driving, exam, road, street, seatbelt”.
Google has researched into finding what kinds of results users expect to see when entering search queries. If you were to search for “Playstation 5 unboxing”, Google understands that you intend to see a video and will rank videos matching the query much higher in the results.
It is important to note that some web pages are updated far more regularly than others, thus Google determines which pages require constant reviews and which to safely review less frequently.
For example, news websites feature updates to breaking stories several times per day, which means that Google will need to re-index these pages very frequently.
Other search queries may require some degree of freshness, but not to the same degree as news articles. If one were to search for “best films of 2019”, these results are unlikely to be updated frequently.
Finally, there are pages containing information that unlikely to change at all, such as “calculate Fahrenheit to Celsius”, and so these pages are a much lower priority for re-indexing.
- Content Quality
In order to differentiate between high and low quality content, Google uses a system known as EAT (Expertise, Authoritativeness and Trust).
If you were to search for “how to write a song”, you would probably prefer to get returned results by a professional or expert in this field such as Beyonce or Michael Jackson, who could be trusted to give good advice in this area. Google makes use of checking backlinks to check whether a page is more trustworthy or not. Backlinks can be compared to ‘votes’ from other websites. When someone links to a page, they’re vouching for that piece of content and recommending it to their readers.
- Page Loading Speed
Google understands that people will grow impatient if a web page takes longer to usual to load, and so they rank faster loading pages much higher on their results table.
It has been determined that about 65% of search queries made with Google are coming from mobile devices, and so it makes sense for Google to rank more mobile friendly websites higher on their results table. Google first used mobile-friendliness as a ranking factor in 2015, but in 2019 this was taken even further; Google will predominantly use the mobile version of the content for indexing and ranking” across all devices.
Google places a lot of emphasis on personalizing the results it provides, so ensure that it is more relevant to the user. For example, if you were searching for “best sushi restaurant”, Google will not retrieve results for restaurants from all over the world but rather ones that are near to your current location.
Language is also of importance with regards to personalization, and Google will display results in the dominant language of a particular country unless the search query particularly specifies the results to be in a different language.