The scientific method has been the most successful contributor to systematic progress in the history of human endeavour. One of the key elements of the method is that if the result cannot be reproduced, it is discarded. Models are then developed consistent with non-discarded work to see if they can make further predictions, which can be tested.
This has not been the case for scientific computation – which has been taking on an increasingly important role in science over the last few decades.
The main author of this article, Les Hatton, co-authored an opinion piece in February in the British journal Nature, calling for consistent regulation of the release of source programs by researchers:
“Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail,” they wrote.
One part of the puzzle is measuring the approximate defect densities in software. It can be measured, for example, as the total number of defects ever found divided by the approximate number of lines of code involved. Typically, this measurement will be somewhere in the range 0.1– 10 per thousand lines of code.
Even though researchers have made some progress in measuring the density of defects, there has been little progress in quantifying the effects of those defects on the computational results. A really terrific piece of code, with 0.1 defects per thousand lines of code, could have a really serious defect, while a fairly awful piece of code with 10 defects per thousand lines of code could turn out to be quite accurate.
I (Hatton) have worked for 40 years in meteorology, seismology, and computing, and most of the software I’ve used has been corrupted to some extent by such defects – no matter how earnestly the programmers performed their feats of testing. The defects, when they eventually surface, always seem to come as a big surprise.
The defects themselves arise from many causes, including: a requirement might not be understood correctly; the physics could be wrong; there could be a simple typographical error in the code, such as a + instead of a - in a formula; the programmer may rely on a subtle feature of a programming language which is not defined properly, such as uninitialized variables; there may be numerical instabilities such as over-flow, under-flow or rounding errors; or basic logic errors in the code. The list is very large. All are essentially human in one form or another but are exacerbated by the complexity of programming languages, the complexity of algorithms, and the sheer size of the computations.
As an example, here is a reconstruction of a vertical slice through a North Sea gas field from seismic data. It looks very convincing to a geologist, and producing it required large amounts of computation on high-performance computers using software that was reliable and well tested by a highly responsible company. It was considered state of the art.
However, we then performed a reproducibility experiment, which took three years, using the same processed data from eight other companies, the same algorithms in the same programming language, and the same input data, but coded independently. You then get the collage.
Individually, they all look very convincing but they are significantly different to a geologist, even though they are supposed to be the same. It turned out that these differences are entirely due to latent software defects that have lain hidden, for years in some cases, before being flushed out by this reproducibility experiment. In this case, the latent defects included the well-known ‘one-off' array indexing problem, uninitialized variables, sign errors in wave-propagation algorithms, simple logic problems whereby unexpected program paths were followed, and incorrectly calculated geometries.
The cumulative effect of these defects meant that we could only reproduce the results to one or two significant figures rather than the six inherent in 32-bit floating point computations. There was no prior warning to the programmers or the end-users that this could be the case. Defects of this size greatly undermine the accuracy of this process, which needs at least three significant figures for these data, and can easily compromise the placing of an extremely expensive drilling rig. Without this comparison, the defects responsible may never have been unearthed– they had already evaded comprehensive test suites and years of production use.
The methods used to develop the software haven't really changed much since the experiment was done in the 1990’s. What has changed, however, is the volume of software used in science and the volume of data processed. Where we once had megaflops and megabytes, we now have petaflops and petabytes.
In the past, one approach to this problem was to develop a great code, and then not allow any changes to be made. This approach was taken by NASA in the 1970s, when scientists there developed its most defect-free code for the space shuttle program.
“Software can never be considered error-free; the problem is to determine when it's reliable enough to fly with,” said Hugh Blair-Smith, who was part of the team that worked on the Space Shuttle software as part of MIT's Instrumentation Lab, and author of Journey to the Moon: The History of the Apollo Guidance Computer.
Its code had an estimated defect rate of 0.11 per 1,000 lines of code. But, this solution was an expensive venture. At the time NASA paid IBM programmers a reputed $500 million to debug 500,000 lines of code.
“Most software at work today is developed by methods very different from what we did then. Instead of having every line of code in the machine known and controlled by a small group of people all working together, every small module of modern software rests on APIs to a dozen or so layers of infrastructure code modules created by hundreds of organizations employing myriads of people. While the integrators of these pyramids benefit from very detailed specifications for each of the bricks used at their level, and create similarly detailed specifications of how their pyramids behave for the benefit of the next layer up, the opportunities for obscure problems are many orders of magnitude greater than anything we saw then ... as anybody looking at a frozen screen knows!”
This approach is not possible at the ATLAS experiment, which is one of four particle detectors running on the Large Hadron Collider at CERN. It has about five million lines of code. And the software is constantly evolving, with improvements, clean-up, and bug fixing.
“Last year, around 300 people worked on our code. This could be a student changing one line or an expert updating thousands of lines,” said David Rousseau, a former coordinator of offline software for data reconstruction, analysis, and simulation.
They have tried organized code review in the past, said Rousseau, but with little success due to the small number of true software experts. There were also some false starts trying to run the software on different platforms (for example, Linux versus Mac operating systems), which can reveal different defects in the code.
Instead, ATLAS uses a kind of semi-open source software; the code can be accessed by everybody in the ATLAS collaboration, which is 3,000 people. They have user tutorials, to help researchers without intense coding experience. And, lastly, they have a series of consistency checks.
“Our system runs this automatic comparison every night and morning to catch any bugs before they’re introduced into the work chain,” said Rousseau. A developer is asked to announce when his or her update might reveal in a change to the scientific results. The next day, the team checks the results against the developer's claim to see if it a real.
The main ATLAS software can be split into 10 different areas for specific physics research, which are each overseen by a few experts. But, these different software areas can sometimes interact with each other in unexpected ways. “After user updates of individual software packages and before any major release, we get experts to manually compare results, line by line, for any potential side effects across all our code,” said Rousseau.
Plus, of course, if the ATLAS experiment claims a discovery – such as the Higgs Boson – it must be confirmed by another experiment, CMS, which also takes data from proton collisions in the LHC.
So, thus far, the best scientists and researchers are simply aware of the problem, and are vigilant about checking results. But every domain of science has problems related specifically to their code, and to their unique ways of coding and debugging. Computing science has failed to reveal any single method to avoid or even quantify defect.
In the meantime, all scientists need to return to the method that has made science so successful in the first place: reproducibility. If scientists can't reproduce a result or if the source code and data are not available to test reproducibility, then that result should be treated with caution. And, if we are to be really brave, it should be discarded.
Comments
dmo
this measurement will be somewhere in the range dmo
Nice
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
Casino Reviews
Thanks to a brilliant effort.
Lloyd Irvin
A position piece written with Adrian Giordani of CERN on the potential impact of unquantifiable software defect on scientific progress.
Thanks for sharing.
Banquetes
Wonderful site and I wanted to post a note to let you know, ""Good job""! I’m glad I found this blog. Brilliant and wonderful job ! Your blog site has presented me most of the strategies which I like. Thanks for sharing this.
The vagaries of hardware,
The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail,” they wrote. bathroom plumbing repair nj
big thanks for share some new knowledge to me
big thanks for share some new knowledge to me
Thanks a lot of sewa mobil | cara membuat blog | contoh surat lamaran kerja | tv online | cerita lucu | lowongan kerja terbaru 2013 | download lagu | lagu terbaru | download software | kata bijak | download film | duduk | cibadak | chord gitar|kunci gitar | cibadak download lagu |zodiak hari ini autotext contoh proposal|tangga lagu | pizap | jasa penerjemah | penerjemah tersumpah | penerjemah
release of source programs is
release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail,” they wrote.hermes outlet
nice
jouer a mario
jailbreak iphone 3g
fener
Today, I went to the beach front with my kids
web tasarım
seo
web tasarım
hosting
network kurulumu
yoyo12
Internet Download Manager (IDM) is a reliabe and very useful tool with safe multipart downloading technology to accelerate from internet your downloads such a video, music, games, documents and other important stuff for you files.
nanar
Hello, thank you very much for this information. I want to share this information on my own website Browsers software
web
I appreciate the information and the effort you put into this article. kabin
However, a criticism about
However, a criticism about the software is cross-platform availability. “We currently provide EpiCollect for Android and Koyo Pelangsing iPhones but are working on other versions.
So far, over 500 projects
So far, over 500 projects have used EpiCollect. These range from cataloguing archaeological sites, animal and plant distribution monitoring and mapping locations of street graffiti. Kosmetik Online
It is because of all those
It is because of all those reasons computing helps speed up the human decision process, not to take over the human decision process.
Compute based results offer an Advertising Here "educated" guess of where the solution lies. But you cannot ever rely blindly on the results. That is why you still keep using prototypes or experience. And if prototypes cannot be done which will incur into measurement errors and phenomena not thoroughly understood, then it is still a human decision with risks.
beton
This is beyond doubt a blog significant to follow.
hd uydu alıcıları
hacamat
beton delme
davetiye
alarko kombi servisi
nanar
Hello, thank you very much for this information. I want to share this information on my own website Browsers software
Nice
Thanks for making such a cool post which is really very well written.will be referring a lot of friends about this.Keep blogging.
Mississauga Limo service
It’s really a nice and useful
It’s really a nice and useful piece of information. I am satisfied that you just shared this useful info with us. Please stay us informed like this. Thanks for sharing
ads dating
www.kombiklimaservisi212.com
does not fit must be re-examined by proponent of other theorists . Kombi Servisi
I am very impressed
I am very impressed. I do have a couple questions for you personally however. Do you think you’re thinking about doing a follow-up posting about this?
laptop repair leeds
I should say
I should say that you have done a great job and your writing style is awesome. I was searching for this topic and just found your site when I was googling. Your blog can be much better if you put some pictures in it.
Vaillant Kombi Servisi
The depth and quality
The depth and quality of this post is really awesome. It is very knowledgeable information that could be very much helpful in IT as well as e-business industry.
Elektronik Sigara
Thanks very much
The post is written in a very good manner and it entails many useful information for e.
donna cerca uomo roma.
koltuk
This post is very useful for all the females koltuk
I'm really impressed with this post
telefonkatalogen.biz
I am happy when reading your blog with updated information! thanks a lot and hope that you will post more site that are related to this site.
so
but withholding code increases the chances that efforts to reproduce results will fail,
koltuk yıkama
koltuk yıkama ve web tasarım
software
The post was professionally written and I feel like the author has extensive knowledge in this subject. Nice post.
youtube mp3 music downloader
Ahmad222
This website is excellent for anyone who has fibromyalgia and as a Canadian it helps me keep abreast of any new medications and treatments. I find out more about fibromyalgia here than anywhere else. boyfriendgiftsideas
Great
Wow, awesome blog layout! How long have you been blogging for? you made blogging look easy. The overall look of your site is magnificent, as well as the content!. Thanks For Your article about 5 contestants gone, just like that! .
PromOutfitters
Beanie Hats
Mo Lewis of the Wholesale Snapback Hats Jets sheared a blood MLB Snapback Hats vessel in Bledsoe’s NFL Snapback Hats chest, causing Bledsoe to NBA Snapback Hats wait seven weeks before Obey Snapback Hats getting medical clearance. By Ymcmb Snapback then, too late.
Nice
Awesome to be viewing your website once more, it's been several weeks on part of me. Well this written text that I’ve been patiently waited for such a lengthy time.
the walking dead episode guide
I found this is an
I found this is an informative and interesting post so i think so it is very useful and knowledgeable. I would like to thank you for the efforts you have made in writing this article.
devis déménagement
Nice
I don't consider myself a big thinker, but your article has triggered my thought processes. Thank you for your great content. I truly appreciate it.
youtube mp3 music downloader
Great
Excellent stuff from you, man. I’ve read your things before and you are just too awesome. I adore what you have got right here. You make it entertaining and you still manage to keep it smart.This is truly a great blog thanks for sharing.
flowers
Great
There are so many comments here that are really interesting and conducive to me thanks for sharing a link especially for sharing this blog.
Salones de Fiestas
Doubtful
I have read your article, but I am doubting that computation is an actual threat to the scientific method.
rabattkod
Doubtful
I have read your article, but I am doubting that computation is an actual threat to the scientific method.
rabattkod
ahmad222
Thank you very much for posting and sharing this great article. It is so interesting. I want to know some other information about this site. So please give me this news quickly. I always will be aware of you. http://www.cnfusebox.com , http://www.comboicons.com , http://www.data-center-cx.com
Great
The SM relies on (attempted)
replication for validation and in some ways, it's more a funding thing
(that is, if your results depend on 100 work years of software
development, replicating requires investing 100 work years).
loan agreements
Worth Bookmarking
For vehicle owners, it takes less than 10 minutes to drive to the business hub and vibrant Orchard Road shopping district, via Pan Island Expressway (PIE) and Central Expressway (AYE).
Sant Ritz
Great
Its important to make people feel comfort near us. We are need other people in our life. We cant live alone in this life. That is why we need to make a good relation to all the people around us.
football live stream
On the need for the code to
On the need for the code to be available for examination, I definitely agree, as you can see from these sources:
http://www.isgtw.org/feed-item/astrophysics-source-code-library
http://www.astrobetter.com/code-in-the-astrophysics-code-source-library-...
http://arxiv.org/abs/1202.1026
www.ascl.net
Alice Allen
Editor, ASCL
What if the hypothesis you
What if the hypothesis you are discarding itself is incorrect? if coding is error prone, can thinking not be? hence i believe, the data if it does not fit must be re-examined by proponent of other theorists (apart from particle theorists).
nice
The LOC question is particularly an issue with
the use of autogenerated code, although I tend to think this is no
different than a compiler.
Ask Mike Palumbo
Computing helps speed up the human decision process
It is because of all those reasons computing helps speed up the human decision process, not to take over the human decision process.
Compute based results offer an "educated" guess of where the solution lies. But you cannot ever rely blindly on the results. That is why you still keep using prototypes or experience. And if prototypes cannot be done which will incur into measurement errors and phenomena not thoroughly understood, then it is still a human decision with risks.
It would be interesting to model the solution and risk to help reduce further the risks associated with human decisions.
Post new comment