What is Robots.txt

robots.txt is a file placed at the root of a website requesting web crawlers not index parts of a website. This does not enforce this in anyway but is a defacto standard that most reputable search engines and web crawlers will respect.

How to add to Jekyll

For my proposes I don’t need to restrict any parts of my website so I will be allowing all.

To add this to a Jekyll site is quite easy.

  • First create a robots.txt file in the root of your site.
  • In this file we will add the following
User-agent= *
Disallow=
  • lastly we will rebuild the site.
root@ubuntu-512mb-sfo1-01=~/www# vim robots.txt
root@ubuntu-512mb-sfo1-01=~/www# ls
404.html  about  assets  cd  certbot.log  _config.yml  feed.xml  Gemfile  Gemfile.lock  _includes  index.md  jekyll  _layouts  LICENSE  _posts  README.md  robots.txt  _sass  _site  sript  thumbnails
root@ubuntu-512mb-sfo1-01=~/www# jekyll build
Configuration file= /root/www/_config.yml
            Source= /root/www
       Destination= /root/www/_site
 Incremental build= disabled. Enable with --incremental
      Generating...
                    done in 2.593 seconds.
 Auto-regeneration= disabled. Use --watch to enable.
root@ubuntu-512mb-sfo1-01=~/www#
root@ubuntu-512mb-sfo1-01=~/www# rm -rf /var/www/html/*
root@ubuntu-512mb-sfo1-01=~/www# cp -rf ~/www/_site/* /var/www/html/
  • And to test we can just curl the site
root@ubuntu-512mb-sfo1-01=~/www# curl https=//invoke.coffee/robots.txt -v
*   Trying 2604=a880=1=20==3085=5001...
* TCP_NODELAY set
* Connected to invoke.coffee (2604=a880=1=20==3085=5001) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection= ALL=!EXPORT=!EXPORT40=!EXPORT56=!aNULL=!LOW=!RC4=@STRENGTH
* successfully set certificate verify locations=
*   CAfile= /etc/ssl/certs/ca-certificates.crt
  CApath= /etc/ssl/certs
* TLSv1.2 (OUT), TLS header, Certificate Status (22)=
* TLSv1.2 (OUT), TLS handshake, Client hello (1)=
* TLSv1.2 (IN), TLS handshake, Server hello (2)=
* TLSv1.2 (IN), TLS handshake, Certificate (11)=
* TLSv1.2 (IN), TLS handshake, Server key exchange (12)=
* TLSv1.2 (IN), TLS handshake, Server finished (14)=
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16)=
* TLSv1.2 (OUT), TLS change cipher, Client hello (1)=
* TLSv1.2 (OUT), TLS handshake, Finished (20)=
* TLSv1.2 (IN), TLS change cipher, Client hello (1)=
* TLSv1.2 (IN), TLS handshake, Finished (20)=
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate=
*  subject= CN=www.invoke.coffee
*  start date= Oct  4 11=07=31 2017 GMT
*  expire date= Jan  2 11=07=31 2018 GMT
*  subjectAltName= host "invoke.coffee" matched cert's "invoke.coffee"
*  issuer= C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
> GET /robots.txt HTTP/1.1
> Host= invoke.coffee
> User-Agent= curl/7.52.1
> Accept= */*
>
< HTTP/1.1 200 OK
< Server= nginx/1.10.3 (Ubuntu)
< Date= Sat, 14 Oct 2017 05=17=05 GMT
< Content-Type= text/plain
< Content-Length= 24
< Last-Modified= Sat, 14 Oct 2017 05=10=20 GMT
< Connection= keep-alive
< ETag= "59e19c3c-18"
< Accept-Ranges= bytes
<
User-agent= *
Disallow=
* Curl_http_done= called premature == 0
* Connection #0 to host invoke.coffee left intact
root@ubuntu-512mb-sfo1-01=~/www#