Without backup plans, global IT outages will happen again

REUTERS/Hasnoor Hussain/File Photo
AirAsia passengers wait to be checked in manually at Kuala Lumpur International Airport’s Terminal 2, after a global IT system outage, in Sepang, Malaysia, today.

1/1

Swipe or click to see more

REUTERS/Hasnoor Hussain/File Photo

AirAsia passengers wait to be checked in manually at Kuala Lumpur International Airport’s Terminal 2, after a global IT system outage, in Sepang, Malaysia, today.

LONDON >> Elements of today’s global IT outage, which grounded planes and hit services from banking to healthcare, have occurred before and until more contingencies are built into networks, and organizations put better backup plans in place, it will happen again.

Today’s outage was caused by an update that U.S. cybersecurity firm CrowdStrike pushed to its clients early this morning which conflicted with Microsoft’s Windows operating system, rendering devices around the world inoperable.

CrowdStrike has one of the largest shares of the highly competitive cybersecurity market that provides such tools, leading some industry analysts to question whether control over such operationally critical software should remain in the hands of just a handful of companies.

But the outage has also raised concerns among experts that many organizations are not well-prepared to implement contingency plans when a single point of failure such as an IT system, or a piece of software within it, goes down.

At the same time there are also more solvable digital disasters looming on the horizon, with perhaps the biggest global IT challenge since the Millennium Bug, the “2038 Problem”, just under 14 years away – and, this time, the world is infinitely more dependent on computers.

“It’s easy to jump at the idea that this is disastrous and therefore suggest there must be a more diverse market and, in an ideal world, that’s what we’d have,” said Ciaran Martin, former head of Britain’s National Cyber Security Centre (NCSC), part of the country’s GCHQ intelligence agency.

“We’re actually good at managing the safety aspects of tech when it comes to cars, trains, planes, and machines. What we’re bad at is then providing services,” he added.

“Look at what happened to the London health system a few weeks ago – they were hacked, and that led to loads of canceled operations, which is physically dangerous,” he said, referring to a recent ransomware incident that affected Britain’s National Health Service (NHS).

Organizations need to look around their IT systems, Martin said, and ensure there are enough failsafes and redundancies in those systems to stay operational in the event of an outage.

Today’s outage happened amid a perfect storm, with both Microsoft and CrowdStrike owning huge shares of a market which relies on both of their products.

“I’m sure the regulators globally are looking at this. There is limited competition globally for operating systems, for example, and also for the large-scale cybersecurity products like the ones CrowdStrike provides,” said Nigel Phair, a cybersecurity professor at Australia’s Monash University.

Today’s outage hit airlines particularly hard, as many scrambled to check in and board passengers who relied upon digital tickets to fly. Some travelers posted photos on social media of hand-written boarding cards provided by airline staff. Others were only able to fly if they had printed out their ticket.

“I think it’s very important for organizations of all shapes and sizes to really look at their risk management and look at an all-hazards approach,” Phair said.

EPOCHALYPSE NOW

Today’s outage will not be the last time the world is reminded of its dependency on computers and IT products for basic services to function. In about 14 years’ time, the world will be faced with a time-based computer issue similar to the Millennium Bug called the “2038 Problem.”

The Millennium Bug, or “Y2K” happened because early computers saved expensive memory space by only counting the last two digits of the year, meaning many systems were unable to distinguish between the year 1900 and 2000, leading to critical errors.

The cost of mitigating the problem in the years before 2000 ran up a global bill of hundreds of billions of dollars.

The 2038 problem, or “Epochalypse”, which begins at 0314 GMT on Jan. 19, 2038, is, in essence, the same problem.

Many computers count the passage of time by measuring the number of seconds since midnight on Jan. 1, 1970, also known as the “Epoch.”

Those seconds are stored as a finite sequence of zeroes and ones, or “bits” but for many computers, the number of bits that can be stored reaches its maximum value in 2038.

“We currently have a situation where there’s huge global disruption because we cannot cope administratively,” said Ciaran Martin, the former NCSC head.

“We can cope in terms of safety, but we can’t cope in terms of service provision when key networks go down.”

2 Comments

By participating in online discussions you acknowledge that you have agreed to the Terms of Service. An insightful discussion of ideas and viewpoints is encouraged, but comments must be civil and in good taste, with no personal attacks. If your comments are inappropriate, you may be banned from posting. Report comments if you believe they do not follow our guidelines. Having trouble with comments? Learn more here.

Without backup plans, global IT outages will happen again

Subscriber Favorites

Honolulu crime boss Michael Miske found guilty of murder, racketeering

Global tech outage disrupts industries, highlights online risks

Man, 34, gets life plus 20 years for murder, arson in Pearl City

NYT: Sources say Biden appears to accept he may have to leave race

Minor coastal flooding for Hawaii in effect until Sunday

Looking Back

2014: Kilauea's 'June 27th' flow threatens for months but ultimately spares Pahoa

Without backup plans, global IT outages will happen again

Subscriber Favorites

More Business