Adding robots meta tags to Hugo blogposts

23 Jul 2021

The “deep” web

The deep web is the part of the World Wide Web whose contents are not indexed by standard web search-engines.

This definition of the deep web somewhat reduces the fascination of hearing “deep web” on its own. Search engines use web crawlers to index the web by following different URLs, recording their content and storing them in a way that ensures fast retrieval at the time of search. The pages these crawlers cannot reach are a part of the deep web. The part of the deep web where illicit activities happen is called the “dark web”.

How to make your web page a part of the deep web? The easiest way is to simply not share your URL publicly. You can also have a robots.txt file at the root directory of the site, which suggests the crawlers on the behavior they should follow on your website (Crawlers can choose to ignore this completely, we only expect honest search engines to respect this).

We can also specify this on particular pages using meta tags in the head section of the HTML.

<meta name="robots" content="noindex, nofollow" />

There are two instructions to the web crawlers here. noindex tells the crawler not to index this particular web page and nofollow tells the crawler not to follow the links on the web page for indexing.

I tried adding this to one of the web pages for this blog. If you inspect this blogpost, you will see the above meta tags.

Now, after having put this tag, I checked searching for some of the words appearing in the content of the blogpost.

The content was not found! On the other hand, if I search for some words appearing in the content of another blogpost that does not have the meta tags, it shows the results.

Adding it to Hugo

To add it to Hugo, we can modify the themes/<theme>/layouts/partials/header.html (or any other file that specifies the template for the header).

{{- if isset .Params "robots" -}}
	<meta name="robots" content="{{ .Params.robots }}" />
{{- end -}}

This will try and find the robots parameter in the markdown file and add its contents to the tag. The markdown params will look something like this

---
title: 
description: 
date:
tags: []
robots: "noindex, nofollow"
---

Bonus: we can create a archetypes/post.md file so that we don’t have to add this manually every time.

---
title: ""
description: ""
date: {{ .Date }}
tags: []
robots: "noindex, nofollow"
---

Now, if we create the file using hugo new posts/sample.md, it will automatically add the above content(with the date formatted) into the file. This page was also created in the same way :)

Go to link →