Earth Notes: On Website Technicals (2020-11)

Updated 2020-11-23 22:13 GMT.
Tech updates: work storage, Let's Encrypt auto-renew, lazy wins, slow https switch, AMP https only, soft canonical, Apple touch, Apache stop, ad sub...
Continuing to slowly shift functions from the old RPi to the new one this month, amongst other things...

2020-11-20: Ads Subtracted

I'm tweaking to reduce blank spaces and pointless page weight for pages where Google won't show ads because of low traffic.

I've simplified the logic for desktop to be the same as AMP, ie to insert ad code only on pages with at least a specified popularity by page hits.

My expectation is that only a small fraction of (desktop and AMP) pages will now carry ad code, but effective site-wide RPM and earnings should be only slightly reduced.

AMP and desktop pages with ads are periodically rebuilt to purge ad weight from those that are no longer popular enough.

After a full rebuild on the count of (eg) desktop pages carrying ads fell to under 70 from more than 210, from nearly 300 candidate pages.

2020-11-17: Old Apache Stop

I have turned off the Apache2 instance running on the old RPi2, since it is not now running any material static site.

# /etc/init.d/apache2 stop
# update-rc.d apache2 disable

A quick attempt to contact one of the residual services now hangs/fails, correctly.

After a reboot Apache is still not responding, correctly.

Note that netstat does show the servlet-based listeners still.

2020-11-16: Apple Touch Icons

I get the occasional blast of requests from an Apple device like so:

www.earth.org.uk:80 "GET /apple-touch-icon-120x120-precomposed.png HTTP/1.1"
www.earth.org.uk:80 "GET /apple-touch-icon-120x120.png HTTP/1.1"
www.earth.org.uk:80 "GET /apple-touch-icon.png HTTP/1.1"

Sometimes a 152x152 icon is requested.

This happens (I think) when EOU is added to an i-device's homescreen.

So using ManyTools' Apple-touch-icon generator in this case, I created a set of icons, I then svn cped down to the desktop root the 60x60, 120x120 and 152x152 versions (generated as apple-touch-icon-iphone-60x60.png, apple-touch-icon-iphone-retina-120x120.png, apple-touch-icon-ipad-retina-152x152.png) where they are usable by default, and avoiding multiple copies of the pixels in the repo.

apple-touch-icon.png
apple-touch-icon-120x120.png
apple-touch-icon-152x152.png

Before putting them in the repo I reduced their weight with zopflipng -m -m.

Possibly a visit to tinypng.com first would have been even better!

I may copy them to the AMP root too, though probably not to the lite/m-dot to avoid incurring extra bandwidth (and storage) costs for users, for just a little bit of eye-candy.

2020-11-15: WWW Soft Canonical

Without adding any overhead (eg extra headers) to normal connections, but to gently redcirect spiders to the WWW https versions of most files, I've inserted the following early config for www.earth.org.uk.

# Redirect most Referer-less http accesses to https.
# Aim to gently redirect spiders to canonical https for most content.
# Avoid redirecting (top-level) HTML files that contain own rel=canonical,
# so users directly choosing http can stay on http.
# Avoid breaking the LE ACME challenge.
# Use a 302 (temporary) redirect, for now.
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{REQUEST_URI} !^/\.well-known/
RewriteCond %{REQUEST_URI} !^/[^/]*\.html$
RewriteCond %{REQUEST_URI} !^$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^/(.*)$ https://www.earth.org.uk/$1 [L,R=302]

Currently there are slightly more http than https WWW requests, by ~10%.

2020-11-12: AMP HTTPS Only

Today I am making amp.EOU https only, eg by redirecting http to https.

Since AMP users are already paying the overhead of the AMP JavaScript, etc, the bandwitdh- and latency- saving aspects of http are likely less important to them. They may likely already have being pushed to the AMP page via https-based search and AMP cache.

Making the http option disappear should save a little crawl bandwidth but spiders. About one quarter of AMP page hits are currently http.

Then I will be down from serving 6 variants of each page — www, m, amp for each of http and https — to 5. It's a start.

There's a little wrinkle to avoid interferring with Let's Encrypt auto-renew.

RewriteEngine on
RewriteCond %{HTTPS} off
RewriteCond %{REQUEST_URI} !^/\.well-known/
RewriteRule ^/(.*)$ https://amp.earth.org.uk/$1 [L,R=301]

I also aim to add a couple of other tweaks, eg that make my security rating in WebPageTest better than the current "F"!

The biggest complaint from WebPageTest is fixed by adding the header Strict-Transport-Security: max-age=31536000 which should force a browser to use https for amp.EOU for a year. This raises the WPT security rating for the home page from "F" to "E".

Adding the header X-Frame-Options: DENY, which I already use for the desktop site, improves the security score to "D", but seems to stop images loading in Firefox (though not Chrome Canary). The header is apparently effectively obsoleted by the Content-Security-Policy header though. Given all that, it's not staying!

I note that https://www.theguardian.com/uk includes these headers:

  • x-frame-options: SAMEORIGIN
  • content-security-policy: default-src https:; script-src https: 'unsafe-inline' 'unsafe-eval' blob: 'unsafe-inline'; frame-src https: data:; style-src https: 'unsafe-inline'; img-src https: data: blob:; media-src https: data: blob:; font-src https: data:; connect-src https: wss:; child-src https: blob:; object-src 'none'; base-uri 'none'
  • referrer-policy: no-referrer-when-downgrade
  • strict-transport-security: max-age=31536000; includeSubDomains; preload
  • x-content-type-options: nosniff
  • x-xss-protection: 1; mode=block

It seems as if x-xss-protection is also only for older browsers, and I should concentrate my efforts on crafting content-security-policy.

Part may be Referrer-Policy: origin-when-cross-origin, or the equivalent via content-security-policy.

Another part may be script-src https://cdn.ampproject.org:* to let the AMP scripts run, though that may not let Google ads run.

I do use a little inline CSS to keep the header and CRP (Critical Rendering Path) small, which implies something like style-src 'unsafe-inline' which weakens the whole mechanism. Maybe I should wean myself off local CSS in all critical cases instead.

2020-11-11: Slow Switchover

GSC coverage graph for https sitemap.xml at .
GSC coverage graph for http (not https) sitemap.xml at ; note the drop to zero 'valid' pages at switchover of canonicals to https, and the slow recovery.

2020-11-05: Lazy Wins

Lazy loading seems to win in two ways. Reducing bandwidth is the obvious one, but also in reducing initial visible page rendering time even when not.

So, for example, on the home page, Chrome doesn't avoid loading any images because even the ones below the fold are not far enough below. But it seems that by letting Chrome concentrate on the important bits above the fold, initial layout is faster. Firefox manages to save bandwidth too by avoiding loading several images for the initial view. Those images will never be loaded if the visitor doesn't scroll down; even if they do, load on the server is spread out.

Here are three simple scenarios, all from WebPageTest instances in London, all over HTTP/2 (ie one TCP connection) to https://www.earth.org.uk.

Note that the bandwidth limits weren't identical across all runs, but higher bandwidth doesn't beat the advantages of lazy loading!

Chrome without Lazy Loading

For this run all loading=lazy attributes were manually removed from the HTML.

Chrome not lazy, 55kB total download, speed index 500ms.

Chrome with Lazy Loading

Chrome lazy, 51kB total download, speed index 400ms.

Firefox with Lazy Loading

Firefox lazy, 21kB total download, speed index 400ms.

2020-11-02: Let's Encrypt Auto-renew Un-snagging

Amongst ignorable email complaints from Let's Encrypt I received a worrying one that implied that the actual EOU TLS certs were going to expire.

Looking in the Apache logs I could see redirects and errors during the http-01 challenge for the amp. and m. sites. The two sites fairly aggressively redirect to www. anything that doesn't look like a top-level HTML page (or script or favicon, etc). That breaks the GET /.well-known/acme-challenge/.... So I put in special-case fixes to not rewrite/redirect any such requests.

Having done that, renewal succeeded by manually running:

% sudo certbot renew

With a fair wind behind, auto-renewal should "just work"!

2020-11-01: More Work Storage

I've added a few more tweaks, eg to stop almost all DAILY and WEEKLY periodic updates when battery is LOW or below.

A few more tweaks up to and on mean that almost no periodic page rebuilds will happen unless the sun is out. Nor will changing the build scripts force a rebuild in the absense of sunshine and a decent state of battery charge.