What the Failure of Boeing’s Starliner
Space Capsule Really Means
Thomas Insights,
by
James Nissenbaum
Original Article
Posted By: Pete Stone,
1/9/2020 11:46:25 AM
Boeing’s new spacecraft, the CST-100 Starliner, experienced a partial failure during its first test flight to the International Space Station (ISS) on December 20, 2019.
Originally, the Starliner was supposed to launch from Earth and dock with the ISS to prove that the capsule could safely carry humans to the station. However, an error in the mission timer aboard the capsule meant that the spacecraft fell short in its orbit and could not rendezvous with the station as intended.
Boeing said that had there been a crew on board, they would not have been in danger. More so, a crew on
This is division of Thomas Publishing
Reply 1 - Posted by:
PlayItAgain 1/9/2020 12:01:45 PM (No. 283342)
However, the nature of the failure implies that there are further systemic problems with the software architecture itself.
This is true. Any one of a number of mission simulations should have detected the error that produced this failure - using the wrong timer.
The author of this article has us following a red herring of software error checking. Which has nothing to do with using the wrong timer. In that sense, the author has failed us.
This was a systems architecture problem, not a software problem. Which might seem a tedious detail, but addressing a problem with software development won't solve the root cause here.
I've dealt with Boeing as a customer on many occasions and as a customer, they are a pain in the self righteous arse! I'm sad to see this, But I can't help but just shake my head.
3 people like this.
Reply 2 - Posted by:
DVC 1/9/2020 12:01:47 PM (No. 283343)
Excellent explanation.
Managing software writers is often like herding cats, and there needs to be extensive and EXPENSIVE testing of all possible pathways through the software before you depend upon it.
In some ways, with software, less is more. Fewer features means fewer possibilities of software failure or confusion of the men/women in the interface. One of Airbus' huge failings, IMO, is that their automation is so dense and impenatrable that frequently, when there is a crash it is because the pilots couldn't figure out what in the hell the aircraft was trying to do.....too many modes and "helpers" in the autopilot, so the aircraft "did weird, unexpected stuff".....because the pilot often thought that he had set the system to do one thing, but had in error set AP mode 22 instead of AP mode 21, and that mode has ........something unexpected. Had there only been 4 modes to the autopilot, all would be remembered clearly. But software folks are notorious for "feature proliferation" - because we can and wouldn't it be cool if...., even if not really necessary.
2 people like this.
Reply 3 - Posted by:
bad-hair 1/9/2020 12:31:56 PM (No. 283398)
A certain aircraft manufacturer needs to get its SH together and fast.
2 people like this.
Reply 4 - Posted by:
MattMusson 1/9/2020 12:46:20 PM (No. 283431)
Go ahead. Offshore all your software testing. You will save a lot of money.
Unless the software is actually important.
7 people like this.
Reply 5 - Posted by:
MattMusson 1/9/2020 12:46:42 PM (No. 283433)
Go ahead. Offshore all your software testing. You will save a lot of money.
NOT!
3 people like this.
Reply 6 - Posted by:
Proud Texan 1/9/2020 1:51:45 PM (No. 283521)
I think poster #2 is correct about "less being more". With all the unneeded options inserted into software it is sometimes difficult to figure out what is going wrong in time to correct it if it can even be figured out. This will not only contribute to problems with planes flying but also to self-driving cars crashing, even into areas not expected such as farm equipment. The GPS that will drive you off a cliff is used to control way too many things and can easily lose control if the human is too far removed from control, of anything. Less can help the human, more can destroy the human.
1 person likes this.
Reply 7 - Posted by:
watashiyo 1/9/2020 4:21:49 PM (No. 283678)
From a guy who knows absolutely nothing about computers and aero-space whatever, but understands "life skills". I think we are starting to experience a gradual break down of everything that was great about America. The failure of public school systems, PC addictions, unethical capitalism, and affirmative actions.
Those issues eventually will filter through every aspect of life in a different form, drawing consequences of life-altering negative results. Not technical but middle-class thinking.
3 people like this.
Reply 8 - Posted by:
Mushroom 1/9/2020 4:45:04 PM (No. 283691)
I agree with #1 and 2, I believe the root cause is sloppy programmers. They have become so used to being able to upload a patch that management doesn't think twice about pushing a shoddy product. It spills over to these functions that don't have time to wait for a patch/fix. Look at the "news" services today. If you put it in print, it had best be correct! They don't even edit past running a spell checker anymore. Now the story gets pushed out with all the spin and speculation then slowly morphs to what the 'settled' truth is.
You cannot write code and then not test the absolute garbage out of it! We need to stop making the customer the beta tester!
Yeah, I know, I am all over the place, but hey, there are no editors anymore.
0 people like this.
Reply 9 - Posted by:
toddh 1/10/2020 11:01:38 AM (No. 284400)
"Errors halt the entire program immediately when they occur." Says someone who doesn't remember 1201 & 1202 on Apollo 11. The AGC's design is an awesome example of error handling and high-reliability design. The rest of the software can error all over the place, just as long as the "fly the spacecraft" levels continue. Mission control verified that 1201 & 1202 aren't part of the "fly the spacecraft" levels, and so History was made a few seconds later.
"If the code running a space probe errors, you cannot reset it." Why not? I'm a big fan of the front panel switch wired straight to the CPU's reset pin, but if a hundred million miles of wire is impractical, put a little detector just after the UART that looks for a "reset" signal from the radio, something like a second of nothing but ones. Also, in modern high-reliability systems, the main CPU is required to update a "heartbeat" signal, which a management processor (like Intel's Management Engine or IBM's Service Processor) monitors to make sure the main CPU is still running and take appropriate action, such as a system reset, if it is not.
0 people like this.
Below, you will find ...
Most Recent Articles posted by "Pete Stone"
and
Most Active Articles (last 48 hours)
Comments:
As Nissenbaum points out, this was a software failure. The fatal 737 Max crashes were caused by software failures. This is troubling to say the least. It suggests to me that Boeing's programmers are getting unpardonably sloppy.