Earth Notes: On Website Technicals (2019/04)Updated 2019-04-21 08:20 GMT
2019/04/19: IMG Defers Ad
I just implemented an item from my to-do list:
- Postpone next auto-inserted ad for each float/body IMG inserted, to make it less likely that a bunch of floats will pile up ahead of an ad and show up
2019/04/18: More Precise datePublished
All but ~20 pages have now had their
mainly simply to add a trailing time, but sometimes also to correct the date.
There are cases where the SVN repository date is not a good reflection
of when the information on page was first published, eg
because a single huge page was split up into many. (Sometimes the
creation date of a new page into which information was moved has
been accepted, but the
temporalCoverage set to reflect
a range including the original date.)
Another case is where a page was created a day or three ahead of time to save a rush on the day, but that page was not actually published (ie embargoed) until a more logical time.
In such cases where the SVN timestamp is unhelpful, the manually-selected plain date has been left in place.
The output of the tool I created to cross-check as I last ran it was:
WARNING: datePublished .electricity-storage-whole-household.html is 2010-12-20, svn is 2017-05-20T10:23:44Z... WARNING: datePublished .index.html is 2007-05-25, svn is 2007-07-18T09:52:33Z... WARNING: datePublished .note-on-site-technicals-20.html is 2019-01-01, svn is 2018-12-27T12:48:38Z... WARNING: datePublished .note-on-site-technicals-3.html is 2017-08-01, svn is 2017-07-31T19:22:13Z... WARNING: datePublished .off-grid-stats-historical-200909.html is 2009-09-11, svn is 2018-10-06T18:48:45Z... WARNING: datePublished .off-grid-stats-historical.html is 2007-11-08, svn is 2018-10-06T18:09:39Z... WARNING: datePublished .saving-electricity-2008.html is 2008-01-01, svn is 2017-08-20T14:20:48Z... WARNING: datePublished .saving-electricity-2009.html is 2009-01-01, svn is 2017-08-20T14:09:14Z... WARNING: datePublished .saving-electricity-2010.html is 2010-01-01, svn is 2017-08-20T13:51:28Z... WARNING: datePublished .saving-electricity-2011.html is 2011-01-01, svn is 2017-08-20T13:09:22Z... WARNING: datePublished .saving-electricity-2012.html is 2012-01-01, svn is 2017-08-20T12:54:01Z... WARNING: datePublished .saving-electricity-2013.html is 2013-01-01, svn is 2017-08-20T12:13:14Z... WARNING: datePublished .saving-electricity-2014.html is 2014-01-01, svn is 2017-08-20T11:00:31Z... WARNING: datePublished .saving-electricity-2015.html is 2015-01-01, svn is 2017-08-20T10:29:47Z... WARNING: datePublished .saving-electricity-2016.html is 2016-01-01, svn is 2017-08-19T15:23:50Z... WARNING: datePublished .saving-electricity-2017.html is 2017-01-01, svn is 2017-08-18T18:39:50Z...
Note that the whole site appears to have been imported into SVN 2007-07-18T09:52:33Z, at that point consisting of the following files/dirs:
.LED-lighting.html .index.html .low-power-laptop.html .saving-electricity.html .solar-PV-pilot-summer-2007.html .work .work/wrap_art.sh 650Wp-1kWhPerDay-sim-tn.gif 650Wp-1kWhPerDay-sim.gif CFL-12V.jpg CFL-desk-lamp.jpg LED-bulb-5W.jpg LED-light-3W.jpg battery-and-controller.jpg battery-voltage-monitor-12V-13V-thresholds-1-full.gif battery-voltage-monitor-12V-13V-thresholds-1.gif compost-bin.jpg laptop-12V-mains-fallback-schema-1-full.gif laptop-12V-mains-fallback-schema-1.gif laptop-12V-mains-fallback-schema-2-full.gif laptop-12V-mains-fallback-schema-2.gif laptop-12V-mains-fallback-schema-2.ps makefile q.jpg solar-PV-system-June-2007-and-planned-expansion-1-full.gif solar-PV-system-June-2007-and-planned-expansion-1.gif solar-cells.jpg solar-panel-on-wall.jpg sparks.jpg xephi_small_logo.png
Timestamps from SVN for anything imported at that point are misleading. Clues in the text, and assuming that a page is at least one day older that the oldest capture in the Wayback Machine, help with these.
Note that this date inference is needed for the home page
index.html as it predates the repository.
2019/04/16: More Precise dateModified
I've extended page date metadata to hours and minutes (UTC) for
dateModified is now the repository source
file latest commit date and time rather than the file timestamp.
Note that other places, such as the sitemaps, may still use the file
timestamp, as it is quick to get and a reasonable guide for a search
engine of when content has changed. And timestamps such as HTTP
LastModified will come from the file timestamps of the
plain or compressed version of the file as requested by the client.
Rather than be free-floating, I have now attached the 'EOU' info
sourceOrganization to the page/Article.
I've also allowed
datePublished to include a (UTC) full time,
where I am able to provide it, eg from inspection of SVN repository logs.
Finally, to make the (last updated) date easy for a user to find, it is now shown per the Google News guidelines:
Date and time should be positioned between the headline and the article text.
2019/04/15: The Joy of Schedule
As I posted as a new issue on Github schemaorg/schemaorg:
In my page:
I talk about the joy of planting, growing and eating pumpkins.
Somewhere in there I'd like to mark up that this fun is to be had April to September every year. Maybe I could jam one of the ISO 8601 repeating times into temporal or temperalCoverage, maybe for the Article or an embedded Thing or Event representing the growing of pumpkins.
What would be he right thing to do here? So far I really can't see what it would be!
I was pointed to the existence of
Schedule so I shall see how I might make those work!
2019/04/10: Fixed copyrightYear
Since structured/meta data has been on EOU, I have had the
copyrightYear be the whole site's first year, ie 2007.
I have now fixed it to be the year that the individual page, ie
CreativeWork, was first published, which is more true
to the definition, and more granular.
2019/04/07: Indexing Bumpy
It's still really unclear to me what the notions of "valid" and "indexed" mean in various places in GSC, such as the AMP and Mobile Usability 'Enhancements' vs coverage by sitemap... GSC seems unwilling to stray reporting much above 50% of my AMP pages as being "Valid" (green) even though it reports no problems, and has 100% of the canonical (desktop) pages as "Valid" in the main sitemap.
(Also, my network connection has been very flaky for about 24h, so I'm expecting some complaints from GSC about that in due course...)
2019/04/04: Speak Moar Liter
I am for now stripping out
speakable meta-data from lite pages,
since no one is going to be using it for a while, and Google only cares
about markup and content parity between desktop and AMP it seems.
~80 bytes lopped off each 'lite'
There are other marginal metadata elements that I could strip out for lite (m-dot) also, if I had the urge!