Potential electrical transients often affect computer's primary memory or random access memory (RAM). RAM often experiences transient bit errors while a program is being executed. Transients cause random bit errors. For this reason, even software, without any design error, often produces wrong result when it is executed at an electrically noisy industrial environment. Faults in program control flow or in data might cause wrong results. This communication aims at modeling an online test case which is capable of detecting multiple bit errors at various locations on RAM during the program execution and to gain a reliable and fault tolerant computing. The proposed unconventional software technique is based on fail stop failure model. It takes necessary recovery actions also immediately after an error is detected, in order to stop error propagation and thus to eliminate ambiguous results caused by potential transients. The conventional error codes for example, parity bit, checksum; Hamming codes can detect multiple errors but cannot repair all errors. Moreover, all these codes are implemented in hardware because their software implementations suffer from high overhead with both time and space redundancy.
The Proposed Work:
However, the proposed software implemented technique injects the code of "No Operation" instruction at various locations inside the computing application program and then verifies for possible "No Operation" code corruption in order to validate the code immunity during run time. The more is the number of "No Operation" code injection, the higher is the fault coverage. It is certain that if a "No Operation" code is corrupted then the adjacent application program codes might also be corrupted. The affected codes are recovered by reloading the application codes or by copying back from a master copy and then the application is re-executed. Again, at some cases, the corrupted codes can be recovered also by using three images of the application where triple memory redundancy (TMR) can be afforded. This technique is a low cost solution towards transient fault tolerance. It has an affordable and less redundancy with both time and space. Using an affordable high-speed machine one can overcome the overhead with little extra execution time. The technique is useful also for locating faults because the locations at which "No Operation" code is injected are known. The choice of NOP-code is guided by the knowledge that it is only one byte long and execution of it does not change the processor-status-word (PSW). It provides a delay of one machine-cycle in order to subdue the presence of transients. During run time we can also compare two PSW s (one before a NOP code and another one after the NOP-code) in order to verify the transient-immunity of the processing environment.
Goutam Kumar Saha [email protected] or [email protected] has been working as a Computer Scientist for last seventeen years. He has worked in various renowned research organizations namely, at LRDE, Defence Research & Development Organisation (DRDO), Bangalore, and at the Electronics Research & Development Centre of India (ER&DCI) Calcutta. At present, he is working at the Centre for Development of Advanced Computing (CDAC), Kolkata as a Scientist-F. He has authored many research papers on fault tolerant computing and natural language engineering. He is a senior member in IEEE (USA), ACM, Computer Society of India (CSI). He is a Fellow Member in IETE, MSPI (New Delhi) and in IMS (Goa). He received various grants & awards from international and national reputed institutions. He is a referee of CSI Journal, IJCPOL, AMSE Journal (France/Spain) and of the IEEE Potentials Magazine. He is an associate editor of the ACM Ubiquity.
A Ubiquity symposium is an organized debate around a proposition or point of view. It is a means to explore a complex issue from multiple perspectives. An early example of a symposium on teaching computer science appeared in Communications of the ACM (December 1989).
To organize a symposium, please read our guidelines.
Ubiquity Symposium: Big Data
- Big Data, Digitization, and Social Change (Opening Statement) by Jeffrey Johnson, Peter Denning, David Sousa-Rodrigues, Kemal A. Delic
- Big Data and the Attention Economy by Bernardo A. Huberman
- Big Data for Social Science Research by Mark Birkin
- Technology and Business Challenges of Big Data in the Digital Economy by Dave Penkler
- High Performance Synthetic Information Environments: An integrating architecture in the age of pervasive data and computing By Christopher L. Barrett, Jeffery Johnson, and Madhav Marathe
- Developing an Open Source "Big Data" Cognitive Computing Platform by Michael Kowolenko and Mladen Vouk
- When Good Machine Learning Leads to Bad Cyber Security by Tegjyot Singh Sethi and Mehmed Kantardzic
- Corporate Security is a Big Data Problem by Louisa Saunier and Kemal Delic
- Big Data: Business, technology, education, and science by Jeffrey Johnson, Luca Tesei, Marco Piangerelli, Emanuela Merelli, Riccardo Paci, Nenad Stojanovic, Paulo Leitão, José Barbosa, and Marco Amador
- Big Data or Big Brother? That is the question now (Closing Statement) by Jeffrey Johnson, Peter Denning, David Sousa-Rodrigues, Kemal A. Delic