Data Analytics: one year later...
Completely coincidentally, I've had a reason to revisit the data I'm capturing precisely 12 months after my last post on the same subject. It's been an interesting journey that I thought was worth delving into.


It's a stormy day here today,
but this is what my readout looks like now after a year of incremental tweaks:
Getting the data out of the controller has been a pain in the arse - sure, I can plug my laptop into the RS485 <-> USB converter and drag out a history from the logs, or sit there watching the readouts in the clunky app they provide, but it's not... useful. I plugged a RasPi in to do for me, but with dodgy drivers that required recompiling every time I update the software it burned hours to maintain and that on its own was after i-can't-remember-how-many-hours-over-weeks it took to take some PHP code from GitHub and adapt it to spitting it into something I could log and read out from. I could go into it, but I finally got to use some of the things they taught me way back in my undergrad CompSci course and honestly I thought I'd drunk those memories clear long ago. It was a pig-wh*** of a thing and let's leave it at that.
So here's how it works:
- The hardware I've cobbled together consists of a RasPi Zero W sitting on top of the box with an RS485<->TTL UART module soldered to the headers, which in turn has been soldered to two leads stripped from a network cable. I ghetto-ised a DC-in/out LiPo UPS unit I hacked and installed as a hedge against the main batteries running low, or the controller throwing a hissy fit. The same unit also powers an ESP8265-based wifi relay unit which controls the on/off switch for Tom's inverter, and a line set up to provide a battery connection to bypass the controller when it throws one of those aforementioned hissy fits. More on that later.
- Every minute a cron job runs which opens a connection to the controller, then (because the controller is as temperamental as a sleep-deprived teenager), it keeps asking every second until it gets an answer (this happens roughly once every hundred reads or so, but enough that dropped data was causing me issues. In testing I never saw it require more than a dozen attempts but in case you were wondering: yes, I handled the loop so that it DOES close the connection before the next attempts starts).
- The PHP script then grabs the 4 raw readings I want to track for (out of the >100 spat out by Herr Controller), runs a few calculations to create 6 data fields that I find useful, wraps them up in a bit of formatting and metadata which it then handballs gracefully over to an InfluxDB running on my server-that-isn't-a-server (another piece of hackery that's a story for another day when we're at the pub and preferably after your second round and mine's a stout thanks).
- Influx is cool - rather than SQL where you define your schema when you create it and pray you got it right because that's what you're stuck with, Influx will take any correctly formatted data you want and Deal With It, whilst adding a handy bit of metadata for you: the time that data (or set of data, called a Point) arrived with nanosecond precision (if you're into that sort of thing). It's one of those "hours to learn, a lifetime to master" but it's well documented, low-resource, simple enough to get started with, and is REST-compliant; it'll nom data from an HTTP message if you want, just set up a bit of security on your firewall and go.
- The engine behind the pretty graphs is a nice bit of FOSS called Grafana. It's database-agnostic, in that it will talk to anything it has an API built for (and being FOSS, it of doesn't have one for your project you can write your own, otherwise rethink your life choices). My current default view gives me the last 24 hours, but I've set up the graphs and aggregates to scale along with the global dashboard timescale. The High Scores won't change, but selecting a week, a month, or an hour will refactor the Money Saved field or the Power consumed since they're just sums of the relevant field values that match the start-a and end-times of the time range), and set Current Battery State to the last value shown. It's cool. It's pretty. You can watch it spool out in realtime if you know that the username and password are the same and set to the sort of thing an idiot would call a "guest" account login. Alright, so that's cool and all, but it's also a metric fuckton of effort and words just so you can see your solar output all pretty over the net. What's the point, and why we should we care and not just pat you on the head saying "Good nerd! Who's a good technomage? You are! You're SUCH A GOOD little technomage, oh yes you are," I can imagine you muttering?
So here's where it gets good. - The key bit is back up in #3 where I mention turning 4 raw readings into 6 data fields. Some of what Grafana shows is unadulterated, raw feed. Battery Voltage, for example. I want to know that. I also pull the Power directly from Herr Controller - it works that out for me so I might as well use it raw. It will also tell me the current (Amps) on each of the circuits, but I don't want it for anything and... why store that when I can calculate it on the fly programatically if I decide I need it? And since the Battery Voltage in my case will always match the Load Voltage to within a reasonable tolerance why record both? Storage is cheap these days, right?
I'm old. I remember when you booted a computer on one floppy disk (you know, that Save Icon you'd always wondered about?), then had to eject and stick one in with the program you wanted to run. I remember when 56kbps dialup modems were SO FAST, then when 8mbps ADSL was AMAZING, (now anything less than 100mbps Fibre makes you a peasant). I remember buying a hard drive and thinking how amazing it was to be able to get 40GB of space for under $10/GB, let alone $10/TB. Storage matters; I might only be writing:
(6 readings x 64 bits) + (6 x 3 bits for metadata) + (1 x 64 bits for the timestamp) = 466 bytes
BUT IT ALL ADDS UP:
466 bytes/point x 60 points/minute x 60 minutes/hour x 24 hours/day x 365 days/year = 14.014GB/year forever, or until I get bored of the idea and set it all on fire.
SO
If a reasonably reliable 1TB hard drive costs $100 (just take my word on it, it's close enough), and to make sure I've redundancy I need two of them in a RAID 1, every year I'm storing
(2 x $100 / 931GB formatted capacity) x 14.014GB = $2.94 worth of data each year.
OK, it's not going to break the bank, but when the point of Project Tom Waits was to reduce waste in a world that's quickly dying, a penny saved is a penny that didn't have to be dug up somewhere near Karratha or Dubbo, I'm going to play the Tight-fisted Wog Card.
MEANWHILE, back to the discussion I was in the middle of when I forgot this wasn't one of my MBA assignments... - (or 6.5.) There's some information I want the know, or keep track of, which Herr Controller doesn't know, or doesn't know it knows. Things like "how many Watt-hours (Wh) did I use since the last reading," or "what would that power be worth if I was buying it off the grid?" It doesn't know how the readings are going to get used, so doesn't guess and just spews data at you if you ask it nicely enough. I have to work that out for myself, so the extra fields I calculate before flinging it over to Influx are:
Watt-hours Generated: Watts output x 1 hour = Watt-hours / 60 minutes in an hour = Watt-hours generated in that minute.
Watt-hours Consumed: As above
Battery absorbtion: Herr Controller can't tell whether juice flowing into the battery and load circuits is charging the battery, or being siphoned off by the load, so I need to calculate a guesstimate (in that it doesn't factor for resistance losses) by
Watts output - Watts consumed = "absorbed" by the battery, and
Cost: I wanted to be able to determine whether all this was "worth doing" (beyond the Learning Is Fun! factor), and what sort of metric might we use to calculate something's relative worth? That's it: C.R.E.A.M. (dolla dolla bill, y'all). Working this out is as simple as finding out the price you pay for each Killowatt-hour (kWh) of electricity (referred to as a "unit" on most power bills) and multiplying the number of kWh I use (not generated, because it's only worth something when it's used) It got a bit more tricky when I remembered that during the day my grid-connected panels mean I'm generally a net-exporter, so they PAY ME for power I send their way, but at less than a third the price they charge me for the same thing in return (bastards). Problem is I haven't set up a monitor of my Smart Meter, so I don't know for sure whether I'm importing or exporting. I make an educated guess by assuming that if Tom is generating more than I'm using, I'm exporting, so use the export tariff rate, otherwise use the import tariff. That looks like:
Watt-hours Consumed x (appropriate tariff/kWh / 1000)
This value of the power used in the last minute, usually between 0.04c and 0.006c (yes, cents) gets written in along with the rest. - (or 6.9) OK, SO MANY WORDS. GET TO THE FUCKING POINT!
Right, so when you look at the dashboard there's a lot happening behind the scenes. It gets worse.
I can m make a lot of decisions just by glancing at Grafana - here's what it looks like now a few hours later:
I can see I've generated sod-all. only 389Wh have gone into the batteries, and they started the day in a fairly discharged (although not deeply) state. I know I'll need to fall back on secondary sources (LiPo power banks, of... gods-above-and-below... The Grid). This is inefficient tho - great, I can maike a decision based on the data but me having to make it means being here, switching things off and on, switching circuits over. I want to use as little of The Grid as possible without destroying some really expensive chemistry, which means carefully nursing every Watt I can. I want stuff to happen without me having to do it, because I am essentially a flawed and poorly encoded EMF-ghost inhabiting a barely-sustainable meat suit and also I got shit to do.
Now enter the other dramatis personae if my long-winded story. Let's start with what I like to call "the orchestrator": Home Assistant is another FOSS product built as commercially-supported, community-driven project to make all the various IoT systems out there play nicely with each other. It was driven by the increasing fragmentation of the IoT market, and the rise of one after another walled garden where products would only work through a proprietary app, often connected to the cloud (aka Some Other Bastard's Server), preventing one manufacturer's devices from working cleanly with another's. I'm sick of having Yet Another Fucking App, and I want to own my stuff without worrying about Some Other Bastard being able to stop it from working just b'coz. As well as controlling IoT devices, it also allows you to Automate Actions based on Triggers and Conditions. Want your kitchen lights to turn on, I dunno, 45min before sunset? Gotcha covered. Maybe you want your bedroom rollershutters to open up in summer to let the cool air in, but close again 30min before dawn so you don't get blinded? Sorted. My HA Dashboard looksa lotexactly like this:
There's a lot going on, but the red box shows a mini-version of what you can see in Grafana - it's pulled from the same database. Now I have a numerical sensor I can use as a Trigger. When Tom's inverter turned off at 2:08AM last night it was because the Voltage reading dropped to the level I set as a "lower bound" and HA sent a command to the Smart Relay I set up in Toshi Station (where Tom lives, because it's where I keep the power converters) saying "switch off now".
Over months and years I've been watching the data, drawing inferences, testing methods, and fine-tuning the Automations to make things more convenient, and push that "Total money saved" value as high as possible. I have systems that will switch on or off depending on whether I have surplus power to use - secondary power banks for charging phones or laptops and so on that only charge when Tom's batteries are full, and then get used when the sun isn't shining. My TV and Receiver are connected to their own outlet, and HA can see if my Media Centre PC turns on so will turn those on as well so I don't have to reach for the remotes, and if it goes to sleep will kill power to the TV until I wiggle the mouse again to wake it up. All of this is enabled by our other friend, the Espressif ESP SoC:
This little guy sits between the power and the thing I want to control and takes orders over wifi. Most of the toggles on my HA Dashboard relate to these or something like it. They're hugely expandable too, so I have a couple with thermometers and other sensors attached as well. To get around the walled garden they've been reprogrammed with another FOSS project called Tasmota that opens up a lot of customisation and is on friendly terms with HA. There's a lot of hackery going on here - just look at the mess on my workbench to get an idea for the number of half-finished and half-baked projects I'm working on at any given time. It's a learning experience - some things work, some things get a but fucked up and the magic smoke comes out, but you don't get to be a technomage without electrocuting a few eggs, or accidentally reverse-polarising the odd battery.
So there you have it.
I can't say that Project Tom Waits is "over" - it'll never be really over unless I one day scream "FARCKTHISI'MFARCKINGOVERIT!". I also don't ever expect it to pay off the money I spent building"and re-building, and re-re-re-building it, but now I know the answer to the question of "whether or not I could?". The question of "whether or not YOU should"... let's just say that if you asked I'd give you a haggard look, point you to the pile of burned out components, failed experiments, and credit card receipts and say "fuck no," then email you a link to this blog and a parts list on AliExpress. Ultimately it's the data that informs whether it was worth it or not, and the empirical evidence is why my next breath would be "but I'm glad I did."
I can't say that Project Tom Waits is "over" - it'll never be really over unless I one day scream "FARCKTHISI'MFARCKINGOVERIT!". I also don't ever expect it to pay off the money I spent building"and re-building, and re-re-re-building it, but now I know the answer to the question of "whether or not I could?". The question of "whether or not YOU should"... let's just say that if you asked I'd give you a haggard look, point you to the pile of burned out components, failed experiments, and credit card receipts and say "fuck no," then email you a link to this blog and a parts list on AliExpress. Ultimately it's the data that informs whether it was worth it or not, and the empirical evidence is why my next breath would be "but I'm glad I did."
Comments
Post a Comment