robots.txt
What is Robots.txt
robots.txt is a file placed at the root of a website requesting web crawlers not index parts of a website. This does not enforce this in anyway but is a defacto standard that most reputable search engines and web crawlers will respect.
How to add to Jekyll
For my proposes I don’t need to restrict any parts of my website so I will be allowing all.
To add this to a Jekyll site is quite easy.
- First create a robots.txt file in the root of your site.
- In this file we will add the following
User-agent= *
Disallow=
- lastly we will rebuild the site.
root@ubuntu-512mb-sfo1-01=~/www# vim robots.txt
root@ubuntu-512mb-sfo1-01=~/www# ls
404.html about assets cd certbot.log _config.yml feed.xml Gemfile Gemfile.lock _includes index.md jekyll _layouts LICENSE _posts README.md robots.txt _sass _site sript thumbnails
root@ubuntu-512mb-sfo1-01=~/www# jekyll build
Configuration file= /root/www/_config.yml
Source= /root/www
Destination= /root/www/_site
Incremental build= disabled. Enable with --incremental
Generating...
done in 2.593 seconds.
Auto-regeneration= disabled. Use --watch to enable.
root@ubuntu-512mb-sfo1-01=~/www#
root@ubuntu-512mb-sfo1-01=~/www# rm -rf /var/www/html/*
root@ubuntu-512mb-sfo1-01=~/www# cp -rf ~/www/_site/* /var/www/html/
- And to test we can just curl the site
root@ubuntu-512mb-sfo1-01=~/www# curl https=//invoke.coffee/robots.txt -v
* Trying 2604=a880=1=20==3085=5001...
* TCP_NODELAY set
* Connected to invoke.coffee (2604=a880=1=20==3085=5001) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection= ALL=!EXPORT=!EXPORT40=!EXPORT56=!aNULL=!LOW=!RC4=@STRENGTH
* successfully set certificate verify locations=
* CAfile= /etc/ssl/certs/ca-certificates.crt
CApath= /etc/ssl/certs
* TLSv1.2 (OUT), TLS header, Certificate Status (22)=
* TLSv1.2 (OUT), TLS handshake, Client hello (1)=
* TLSv1.2 (IN), TLS handshake, Server hello (2)=
* TLSv1.2 (IN), TLS handshake, Certificate (11)=
* TLSv1.2 (IN), TLS handshake, Server key exchange (12)=
* TLSv1.2 (IN), TLS handshake, Server finished (14)=
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16)=
* TLSv1.2 (OUT), TLS change cipher, Client hello (1)=
* TLSv1.2 (OUT), TLS handshake, Finished (20)=
* TLSv1.2 (IN), TLS change cipher, Client hello (1)=
* TLSv1.2 (IN), TLS handshake, Finished (20)=
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate=
* subject= CN=www.invoke.coffee
* start date= Oct 4 11=07=31 2017 GMT
* expire date= Jan 2 11=07=31 2018 GMT
* subjectAltName= host "invoke.coffee" matched cert's "invoke.coffee"
* issuer= C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* SSL certificate verify ok.
> GET /robots.txt HTTP/1.1
> Host= invoke.coffee
> User-Agent= curl/7.52.1
> Accept= */*
>
< HTTP/1.1 200 OK
< Server= nginx/1.10.3 (Ubuntu)
< Date= Sat, 14 Oct 2017 05=17=05 GMT
< Content-Type= text/plain
< Content-Length= 24
< Last-Modified= Sat, 14 Oct 2017 05=10=20 GMT
< Connection= keep-alive
< ETag= "59e19c3c-18"
< Accept-Ranges= bytes
<
User-agent= *
Disallow=
* Curl_http_done= called premature == 0
* Connection #0 to host invoke.coffee left intact
root@ubuntu-512mb-sfo1-01=~/www#