Root Out Bugs & Glitches

Reproducible Bugs are Easy to Fix

The biggest delay in getting most electronic products into volume production is getting the last few irreproducible crashes, lockups, and glitches out of the software.

Using our advanced MULTI development environment in a proper software development process, most bugs that can be reproduced in the laboratory can be fixed in under one hour; nearly all remaining reproducible bugs can be fixed in less than a day; and it is very rare for a reproducible bug to require more than a week to fix with the most advanced debugging tools.

Irreproducible Bugs are Very Hard to Fix

The intermittent random crashes, lock ups, and glitches that only occur at customer sites are nearly impossible to fix. Since the developers can't reproduce these problems in the laboratory, they must guess what the problem is. After weeks of randomly rooting around in the software they may find and fix a similar but unrelated problem. Thinking that they have solved the problem, they ship a new version to the customer. But after waiting patiently for weeks, when the customer discovers that the new version is no better, the customer goes ballistic. As the early customers grow increasingly furious, volume production must be delayed.

Finally, developers are sent into the field to fix the problems. But bugs that only manifest themselves after days or weeks require a long time to pin down. It is especially difficult to fix bugs that manifest themselves in a different way every time. The seemingly interminable process of tracking down these intermittent failures consumes 20% to 50% of most software development schedules.

Don't Put Irreproducible Bugs in Your Product

Yogi Berra might ask, "If it's so painful to remove the irreproducible bugs from your software, why put them in there in the first place?" The reason that a bug is irreproducible is that the software behaves differently when the customer runs it than when the developer runs it. If your software always behaved the same, it couldn't have any irreproducible bugs (only reproducible bugs). Green Hills Software's decade of research into the causes of inconsistent software behavior has resulted in the development of a comprehensive range of products and services that prevent irreproducible random software failures, reducing your time to volume production by up to 50%.

Compile Time Error Checking  

Most programming languages specify that if you do certain things, anything might happen. If a programmer accidentally does one of these things the program will behave bizarrely and probably differently every time. This is the source of many irreproducible bugs. Green Hills Software compilers detect many of these irreproducible bugs even before the program is tested.

Runtime Error Checking

Our MULTI debugger, optimizing compilers, and runtime libraries include Runtime Error Checking capabilities that automatically detect any undefined behavior that occurs while the program is running. MULTI immediately notifies the developer of the bug by halting the program in the debugger at the source code line that contains the error. This makes it a simple matter to find and fix the bug. Using other embedded software development environments, such a bug would have gone undetected during development, and no economically feasible amount of testing could have uncovered it. It would have taken days, weeks, or even months of frustrating debugging, while customers were fuming and volume shipments were delayed, to fix the problem. Soon after these capabilities were first introduced into MULTI, MULTI became the world's most popular embedded debugger.

Field Debugging Makes Bugs Reproducible

Sometimes the reason that the developers can't reproduce a reproducible bug in the laboratory is that the customer did not tell the developers everything that they did. Green Hills Software's field debugging capability allow your developers to debug your product while it is running at a customer's site from the computers on the developers desks where they have access to all of the most advanced debugging tools. They can record, upload, and examine everything that the customer did with the product instead of relying on an incoherent report from an irate customer. Field debugging enables them to reproduce the bug in the laboratory, making it easy to fix.

Primitive RTOSes Turn Common Errors into Nasty Irreproducible Bugs

Your product's propensity to randomly crash, lock up, or glitch depends primarily on the RTOS you select. Primitive real time operating systems from the 1980's, such as VxWorks and pSOS, can be subtly corrupted by common programming errors anywhere in the code. The corruption will slowly spread through the software causing it to fail an indeterminate amount of time after the actual error occurred. Each time the software is run it will fail in a different and mysterious way.

When your developers make the same mistake while using INTEGRITY, INTEGRITY will immediately notify them of the bug by halting the program in the debugger at the source code line that contains the error, making it a simple matter to fix the bug. Using VxWorks or QNX, such a bug would have gone undetected during development, and no economically feasible amount of testing could have uncovered it. It would have taken days, weeks, or even months of frustrating debugging, while customers were fuming and volume shipments were delayed, to fix the bug.

INTEGRITY Makes Bugs Reproducible

The fundamental design objective of our revolutionary INTEGRITY RTOS design is to ensure that each component of a large software program will behave the same when it is being debugged alone on a developer's desk as when it is being tested by quality control after being integrated with all of the other components in the product, and the same as when it is running in the product at the customer's site. When the software works correctly in the lab it will work correctly in the field. If the software fails in the field, it will fail identically in the lab, enabling the software developers to fix the bug quickly.

Response time is the amount of time that it takes for the software to switch from performing one task to another in response to an external event (e.g. from calculating the optimal fuel mixture to firing the spark plug of an engine running at 8000 RPM). Primitive RTOSes, including VxWorks and Linux, occasionally disable interrupts to perform housekeeping functions. During this time, the software is unable to respond to external events, drastically increasing the worst-case response time. These operating systems also use semaphores and reentrancy inside the RTOS, which can drastically extend the response time in rare circumstances. It is unlikely that the obscure combination of events that cause the worst-case response times will ever be observed by developers in the lab. But when millions of units operate for thousands of hours (e.g. 1 quintillion clock cycles), events that happen only once in a billion times, actually occur billions of times. And each time they do, the system responds much more slowly than normal, which may cause an irreproducible crash, lock up, or glitch. A few years ago, Red Hat published a study of this effect in Linux: "Linux Scheduler Latency," March 2002. It determined that on a particular computer the average response time to an event was 60 microseconds, but once in a million times the worst-case response time was over 4000 times longer! After they performed major surgery on a standard Linux to build a "real time" Linux, they proudly announced that the worst-case now took only 30 times longer than average!

INTEGRITY has a revolutionary design for consistently fast responsiveness: INTEGRITY never disables interrupts in systems calls and it never uses semaphores or reentrancy in the operating system. If a Wind River or Linux salesman assures you that this is impossible, it only shows just how far behind us they are in operating system technology. The INTEGRITY design consistently attains the lowest worst-case response time and context switch time irrespective of the other software running in the system. This ensures that if the software runs fast enough in the lab, it will run fast enough in the field.

© 1996-2017 Green Hills Software Trademark & Patent Notice