The Crux

Platforms Worth Building On

avel-chuklanov-IB0VA6VdqBw-unsplash-3x2.jpg

Optus' CEO has resigned after fronting Parliament to explain the big outage from 8 November.

Cisco is renaming product categories to something less daft while Microsoft puts Copilot into everything.

Excellent daily tech newsletter The Sizzle is running a free trial promotion that gives money to charity.

Critical infrastructure breaches in Denmark show the basics are still important. Patch your stuff!

A bunch of Kubernetes news, the chipwars roll on and Nvidia has a new accelerator card.

Python packages are leaking credentials, some from companies that should be better.

Things to note

Australian telco Optus fronted Parliament to explain its massive outage on 8 November 2023. This writeup from iTNews is the best one I’ve seen so far, and provides enough details to figure out what probably happened. Here’s a thread on Mastodon outlining what I think happened. I still want Optus to publish a proper technical PIR so we can all learn to be better, but I won’t hold my breath. The company is still terrified we’ll learn the truth about last year’s massive data breach. That will have to happen under a new CEO, though, as Kelly Bayer Rosmarin has resigned.

Cisco reported quarterly revenue is up 8% YoY from USD $12.8B to $14.6B. Net income is also up 5.5% YoY from $3B to $3.2B. The Line was sad, though, because Cisco said it’d only make $12.6-12.8B next quarter when analysts thought it’d be $14.2B.

More interesting is that Cisco will rename the product categories that it uses for financial reporting. Gone are “Secure, Agile Networks”, “Internet for the Future” and “Optimized Application Experiences”. Instead we get the more boring, but actually useful, “Networking”, “Security”, “Collaboration” and “Observability”. In its presentation, Cisco didn’t explicitly say where Splunk will be, but I’d guess Security since Observability contains AppDynamics and ThousandEyes.

Meanwhile security researcher SektorCERT published a report on breaches of Denmark’s critical infrastructure. Unpatched Zyxel firewalls were probably the way in. It’s a good writeup. Some details are a bit disturbing, such as that some organisations “had deliberately opted out of the [firewall] updates as there was a cost from the supplier to install them”.

I wrote about some Kubernetes ecosystem companies to watch based on my time at KubeCon. Kubernetes is ‘just’ a workload orchestrator and you need a lot of other components to make it work well. That’s why there’s a large and growing ecosystem of products—commercial and open source—that would like to help you do that. If you haven’t explored Kubernetes yet, now’s a good time to start.

You may also enjoy these KubeCon related articles I wrote: - Resistance is Futile, Kubernetes Will Assimilate You - The Serious Money Has Arrived At KubeCon - AI Speculation Dominates Cloud Native Conference

The Australian government has decided to trial Microsoft 365 Copilot for six months. This should help the APS lie to Cabinet about the need for legislative changes at scale. It might also help Microsoft take regulatory capture to new heights. Especially since Microsoft has rebranded Bing Chat as Microsoft Copilot.

Microsoft has leaned hard into making Copilot a brand across its entire product line, putting spicy-autocomplete into everything like it’s high-fructose corn syrup. Microsoft is just a bit too desperate to shoehorn OpenAI into everything, in my opinion. Given the economics of AI/SALAMIs/BDSMs it looks like trying to brute force success while preparing to spread the blame and bury the costs of failure after the hype peak is passed. And avoid getting sued to death by Disney. It’s unsurprising that Microsoft would grab Sam Altman when OpenAI’s board threatened all this money by firing him.

Nvidia will have a new H200 accelerator card out next year, an update to the H100 card that is almost impossible to buy. It'll have 141GB of VRAM compared to 50GB on the H100 and A100 cards, and a bit more memory bandwidth, but otherwise isn’t much different. According to the AnandTech article, the H200 will likely be in even shorter supply.

Related: Chinese imports of chip manufacturing kit jumped 94% YoY, with a 6x jump in imports of lithography equipment from the Netherlands. This suggests that Chinese companies were concerned tech embargoes would be effective, and acted quickly to get gear before the restrictions came into force. Now, to be effective, the restrictions will have to last a while as the newly purchased equipment ages.

Alibaba has blamed the chipwars in calling off its planned spin-off of its Cloud Intelligence Group. Alibaba isn’t doing as well as rival Tencent, and its value has dropped significantly with shares down from a peak of USD $304.69 in October 2020 to about a quarter of that at $77.60 in November 2023, for a market capitalisation of $197.63B.

There’s a surprising flaw in SSH that can compromise the private key, but it’s pretty rare and hard to use. Most implementations have countermeasures that provide additional protection, but some updates will likely be beneficial. Another reminder that secure comms is really hard and it’s extremely unhelpful when politicians and cops keep trying to make it worse.

New York has decided it’ll have to force hospitals in the state to take security seriously. NY will soon require them to develop and test incident response plans, test security of software from vendors, use MFA in some places. All stuff they should have already been doing. Something for other states and countries to consider, given the atrocious track record of the health sector on information security.

I concur with this OpEd at DataBreaches.net that breach disclosures hide too much. Right now we have neither the effective deterrent of consequences for failure nor transparency about what happened so we can protect ourselves. The incentives need to change.

US federal regulators don’t know how often driverless cars hit pedestrians. It’s theoretically possible that driverless cars are safer than human-driven cars, but without good data on it, they might be more dangerous. We don’t know. Apparently trying to figure that out wasn’t a question anyone asked before running live human trials on the general public. That was certainly one of the available choices.

Intel has fixed a CPU bug, dubbed Reptar, that allows VMs to crash the hypervisor. A big deal for cloud providers in particular. The full list of affected processors is here.

GitGuardian reckons 3,000 our of the 450,000 projects on PyPI have at least one credential exposed in the code. Some are from “very large companies that have robust security teams”. Also “we discovered at least 15 incidents where the publisher was unaware they had made their project public” which is… quite a thing, no? Watching the range of facial expressions made by the head of Legal as they got told this would have been quite entertaining.

The excellent daily tech newsletter The Sizzle is doing a charity subscription promo. Sign up for a free trial with this link and $1 will be donated to CAFS, a charity that helps children and families in need. Do some good and get free stuff. What’s not to like?

Something fun to counteract all the Torment Nexus news: terrible metaphors and similes. Allegedly by students, but some of them are a bit too clever so I doubt it. My favourite is “John and Mary had never met. They were like two hummingbirds who had also never met.”

Longer reads

The New Yorker covers cops misusing facial surveillance

We did try to warn you this would happen. I call it StasiTech for a reason.

As a counterbalance, here’s a nice graphic story on LLMs from the New Yorker

Elements of Platforms

We’re currently planning our research agenda for next year at PivotNine, and I’ve just about settled on leaning hard into platform engineering.

Platform engineering has been our focus this year and watching the term settle into regular use has been interesting. My fear that lazy marketers would ruin it for everyone, again, have not been realised. Yet. Platform engineering seems to be working well as a collective term for shared infrastructure built and maintained as a product to be consumed by developers and others.

Platform engineering is both new and old. A cynic might say it’s just a new label on an old product, like slapping a new hat on Malibu Stacy. That’s not entirely wrong, as elements of good old-fashioned systems administration are at the core of operating a platform. Yet I see new elements as well.

Cloud provided an alternative to the terrible experience of internal shared services IT that most enterprises inflicted on everyone. Monopoly rents gave way to shadow IT financed with individual credit cards. Cells of resistance rebelled against the imperial regime, gradually proving that a new way was not only possible, but preferable. Now that new way is being absorbed back into the imperial core.

The early attempts at assimilation were a mixed blessing. Agile and Scrum replaced time-consuming waterfall project plans as tech people everywhere completely missed the point of the original paper. DevOps lost its original intended meaning of close collaboration between teams and became something companies tried to buy from management consultants. Not everything was a waste of time, and the smarter people have learned good lessons from these activities.

I think the time is right for a kind of consolidation. The new can be blended with the old to create something that is a mix of both. A platform is something to build on. A way to start higher, standing on the shoulders of giants, rather than starting from first principles in the dirt far below. A platform should be solid, built from quality materials using known-good techniques. It shouldn’t collapse in the first stiff breeze, killing dozens of innocent bystanders.

Organisations need help to build this modern platform. It’s not a simple construction made with hand tools and wood. The platform engineering I’ve been tracking is a complex arrangement of moving parts, each of them a complex undertaking on their own. It is not pure technology, but a socio-technological system that combines human abilities, frailties and concerns with rapidly changing technologies. It is less like a simple birdhouse and more like an epic production of Wagner’s Ring Cycle.

One does not start learning the trumpet or French horn with Ride of the Valkyries.

I think organisations need a guide for what platform engineering is about. Something to help them improve gradually, not something that requires a massive rip-and-replace exercise that totally transforms everything they’re doing. Something that shows them what to expect while letting them pick the pieces that make sense to do right now. Something that helps lay good foundations for the future while keeping the lights on today.

While I have some ideas about what that might look like, I’d be interested in hearing from readers about their own experiences. What has worked, and what hasn’t? What do you think is a terrible idea to be avoided at all costs? What should people definitely do?

With luck, this will be a fun thing to work on for some time. Together we can build something useful. Something those who come after us can build on with pride.