Share |

Does computation threaten the scientific method?

Physicists attend a lecture at CERN, while still working on their laptops. Image courtesy Frederic Hemmer.

The scientific method has been the most successful contributor to systematic progress in the history of human endeavour. One of the key elements of the method is that if the result cannot be reproduced, it is discarded. Models are then developed consistent with non-discarded work to see if they can make further predictions, which can be tested.

This has not been the case for scientific computation – which has been taking on an increasingly important role in science over the last few decades.

The main author of this article, Les Hatton, co-authored an opinion piece in February in the British journal Nature, calling for consistent regulation of the release of source programs by researchers:

“Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail,” they wrote.

Defects in software

One part of the puzzle is measuring the approximate defect densities in software. It can be measured, for example, as the total number of defects ever found divided by the approximate number of lines of code involved. Typically, this measurement will be somewhere in the range 0.1– 10 per thousand lines of code.

Even though researchers have made some progress in measuring the density of defects, there has been little progress in quantifying the effects of those defects on the computational results. A really terrific piece of code, with 0.1 defects per thousand lines of code, could have a really serious defect, while a fairly awful piece of code with 10 defects per thousand lines of code could turn out to be quite accurate.

I (Hatton) have worked for 40 years in meteorology, seismology, and computing, and most of the software I’ve used has been corrupted to some extent by such defects – no matter how earnestly the programmers performed their feats of testing. The defects, when they eventually surface, always seem to come as a big surprise.

The defects themselves arise from many causes, including: a requirement might not be understood correctly; the physics could be wrong; there could be a simple typographical error in the code, such as a + instead of a - in a formula; the programmer may rely on a subtle feature of a programming language which is not defined properly, such as uninitialized variables; there may be numerical instabilities such as over-flow, under-flow or rounding errors; or basic logic errors in the code. The list is very large. All are essentially human in one form or another but are exacerbated by the complexity of programming languages, the complexity of algorithms, and the sheer size of the computations.

The long road to reproducibility

As an example, here is a reconstruction of a vertical slice through a North Sea gas field from seismic data. It looks very convincing to a geologist, and producing it required large amounts of computation on high-performance computers using software that was reliable and well tested by a highly responsible company. It was considered state of the art.

However, we then performed a reproducibility experiment, which took three years, using the same processed data from eight other companies, the same algorithms in the same programming language, and the same input data, but coded independently. You then get the collage.

Reproducing results gives 9 different answers. Image courtesy Les Hatton.

Individually, they all look very convincing but they are significantly different to a geologist, even though they are supposed to be the same. It turned out that these differences are entirely due to latent software defects that have lain hidden, for years in some cases, before being flushed out by this reproducibility experiment. In this case, the latent defects included the well-known ‘one-off' array indexing problem, uninitialized variables, sign errors in wave-propagation algorithms, simple logic problems whereby unexpected program paths were followed, and incorrectly calculated geometries.

The cumulative effect of these defects meant that we could only reproduce the results to one or two significant figures rather than the six inherent in 32-bit floating point computations. There was no prior warning to the programmers or the end-users that this could be the case. Defects of this size greatly undermine the accuracy of this process, which needs at least three significant figures for these data, and can easily compromise the placing of an extremely expensive drilling rig. Without this comparison, the defects responsible may never have been unearthed– they had already evaded comprehensive test suites and years of production use.

The methods used to develop the software haven't really changed much since the experiment was done in the 1990’s. What has changed, however, is the volume of software used in science and the volume of data processed. Where we once had megaflops and megabytes, we now have petaflops and petabytes.

Getting rid of the Space Shuttle bugs

In the past, one approach to this problem was to develop a great code, and then not allow any changes to be made. This approach was taken by NASA in the 1970s, when scientists there developed its most defect-free code for the space shuttle program.

“Software can never be considered error-free; the problem is to determine when it's reliable enough to fly with,” said Hugh Blair-Smith, who was part of the team that worked on the Space Shuttle software as part of MIT's Instrumentation Lab, and author of Journey to the Moon: The History of the Apollo Guidance Computer.

Its code had an estimated defect rate of 0.11 per 1,000 lines of code. But, this solution was an expensive venture. At the time NASA paid IBM programmers a reputed $500 million to debug 500,000 lines of code.

“Most software at work today is developed by methods very different from what we did then. Instead of having every line of code in the machine known and controlled by a small group of people all working together, every small module of modern software rests on APIs to a dozen or so layers of infrastructure code modules created by hundreds of organizations employing myriads of people. While the integrators of these pyramids benefit from very detailed specifications for each of the bricks used at their level, and create similarly detailed specifications of how their pyramids behave for the benefit of the next layer up, the opportunities for obscure problems are many orders of magnitude greater than anything we saw then ... as anybody looking at a frozen screen knows!”

Five million lines of code at ATLAS

This approach is not possible at the ATLAS experiment, which is one of four particle detectors running on the Large Hadron Collider at CERN. It has about five million lines of code. And the software is constantly evolving, with improvements, clean-up, and bug fixing.

“Last year, around 300 people worked on our code. This could be a student changing one line or an expert updating thousands of lines,” said David Rousseau, a former coordinator of offline software for data reconstruction, analysis, and simulation.

They have tried organized code review in the past, said Rousseau, but with little success due to the small number of true software experts. There were also some false starts trying to run the software on different platforms (for example, Linux versus Mac operating systems), which can reveal different defects in the code.

Instead, ATLAS uses a kind of semi-open source software; the code can be accessed by everybody in the ATLAS collaboration, which is 3,000 people. They have user tutorials, to help researchers without intense coding experience. And, lastly, they have a series of consistency checks.

“Our system runs this automatic comparison every night and morning to catch any bugs before they’re introduced into the work chain,” said Rousseau. A developer is asked to announce when his or her update might reveal in a change to the scientific results. The next day, the team checks the results against the developer's claim to see if it a real.

The main ATLAS software can be split into 10 different areas for specific physics research, which are each overseen by a few experts. But, these different software areas can sometimes interact with each other in unexpected ways. “After user updates of individual software packages and before any major release, we get experts to manually compare results, line by line, for any potential side effects across all our code,” said Rousseau.

Plus, of course, if the ATLAS experiment claims a discovery – such as the Higgs Boson – it must be confirmed by another experiment, CMS, which also takes data from proton collisions in the LHC.

So, thus far, the best scientists and researchers are simply aware of the problem, and are vigilant about checking results. But every domain of science has problems related specifically to their code, and to their unique ways of coding and debugging. Computing science has failed to reveal any single method to avoid or even quantify defect.

In the meantime, all scientists need to return to the method that has made science so successful in the first place: reproducibility. If scientists can't reproduce a result or if the source code and data are not available to test reproducibility, then that result should be treated with caution. And, if we are to be really brave, it should be discarded.

Your rating: None Average: 4.3 (21 votes)

Comments

dmo

this measurement will be somewhere in the range dmo

Nice

Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
Casino Reviews

Thanks to a brilliant effort.

Lloyd Irvin
A position piece written with Adrian Giordani of CERN on the potential impact of unquantifiable software defect on scientific progress.

Thanks for sharing.

Banquetes
Wonderful site and I wanted to post a note to let you know, ""Good job""! I’m glad I found this blog. Brilliant and wonderful job ! Your blog site has presented me most of the strategies which I like. Thanks for sharing this.

The vagaries of hardware,

The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail,” they wrote. bathroom plumbing repair nj

release of source programs is

release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail,” they wrote.hermes outlet

nice

fener

Today, I went to the beach front with my kids
web tasarım
seo
web tasarım
hosting
network kurulumu

yoyo12

Internet Download Manager (IDM) is a reliabe and very useful tool with safe multipart downloading technology to accelerate from internet your downloads such a video, music, games, documents and other important stuff for you files.

nanar

Hello, thank you very much for this information. I want to share this information on my own website Browsers software

web

I appreciate the information and the effort you put into this article. kabin

However, a criticism about

However, a criticism about the software is cross-platform availability. “We currently provide EpiCollect for Android and Koyo Pelangsing iPhones but are working on other versions.

So far, over 500 projects

So far, over 500 projects have used EpiCollect. These range from cataloguing archaeological sites, animal and plant distribution monitoring and mapping locations of street graffiti. Kosmetik Online

It is because of all those

It is because of all those reasons computing helps speed up the human decision process, not to take over the human decision process.
Compute based results offer an Advertising Here "educated" guess of where the solution lies. But you cannot ever rely blindly on the results. That is why you still keep using prototypes or experience. And if prototypes cannot be done which will incur into measurement errors and phenomena not thoroughly understood, then it is still a human decision with risks.

beton

This is beyond doubt a blog significant to follow.
hd uydu alıcıları
hacamat
beton delme
davetiye
alarko kombi servisi

nanar

Hello, thank you very much for this information. I want to share this information on my own website Browsers software

Nice

Thanks for making such a cool post which is really very well written.will be referring a lot of friends about this.Keep blogging.
Mississauga Limo service

It’s really a nice and useful

It’s really a nice and useful piece of information. I am satisfied that you just shared this useful info with us. Please stay us informed like this. Thanks for sharing

ads dating

www.kombiklimaservisi212.com

does not fit must be re-examined by proponent of other theorists . Kombi Servisi

I am very impressed

I am very impressed. I do have a couple questions for you personally however. Do you think you’re thinking about doing a follow-up posting about this?
laptop repair leeds

I should say

I should say that you have done a great job and your writing style is awesome. I was searching for this topic and just found your site when I was googling. Your blog can be much better if you put some pictures in it.
Vaillant Kombi Servisi

The depth and quality

The depth and quality of this post is really awesome. It is very knowledgeable information that could be very much helpful in IT as well as e-business industry.
Elektronik Sigara

Thanks very much

The post is written in a very good manner and it entails many useful information for e.
donna cerca uomo roma.

koltuk

This post is very useful for all the females koltuk

I'm really impressed with this post

telefonkatalogen.biz
I am happy when reading your blog with updated information! thanks a lot and hope that you will post more site that are related to this site.

so

but withholding code increases the chances that efforts to reproduce results will fail,
koltuk yıkama
koltuk yıkama ve web tasarım

software

The post was professionally written and I feel like the author has extensive knowledge in this subject. Nice post.
youtube mp3 music downloader

Ahmad222

This website is excellent for anyone who has fibromyalgia and as a Canadian it helps me keep abreast of any new medications and treatments. I find out more about fibromyalgia here than anywhere else. boyfriendgiftsideas

Great

Wow, awesome blog layout! How long have you been blogging for? you made blogging look easy. The overall look of your site is magnificent, as well as the content!. Thanks For Your article about 5 contestants gone, just like that! .
PromOutfitters

Beanie Hats

Mo Lewis of the Wholesale Snapback Hats Jets sheared a blood MLB Snapback Hats vessel in Bledsoe’s NFL Snapback Hats chest, causing Bledsoe to NBA Snapback Hats wait seven weeks before Obey Snapback Hats getting medical clearance. By Ymcmb Snapback then, too late.

Nice

Awesome to be viewing your website once more, it's been several weeks on part of me. Well this written text that I’ve been patiently waited for such a lengthy time.
the walking dead episode guide

I found this is an

I found this is an informative and interesting post so i think so it is very useful and knowledgeable. I would like to thank you for the efforts you have made in writing this article.
devis déménagement

Nice

I don't consider myself a big thinker, but your article has triggered my thought processes. Thank you for your great content. I truly appreciate it.
youtube mp3 music downloader

Great

Excellent stuff from you, man. I’ve read your things before and you are just too awesome. I adore what you have got right here. You make it entertaining and you still manage to keep it smart.This is truly a great blog thanks for sharing.

flowers

Great

There are so many comments here that are really interesting and conducive to me thanks for sharing a link especially for sharing this blog.
Salones de Fiestas

Doubtful

I have read your article, but I am doubting that computation is an actual threat to the scientific method.
rabattkod

Doubtful

I have read your article, but I am doubting that computation is an actual threat to the scientific method.
rabattkod

ahmad222

Thank you very much for posting and sharing this great article. It is so interesting. I want to know some other information about this site. So please give me this news quickly. I always will be aware of you. http://www.cnfusebox.com , http://www.comboicons.com , http://www.data-center-cx.com

Great

The SM relies on (attempted)
replication for validation and in some ways, it's more a funding thing
(that is, if your results depend on 100 work years of software
development, replicating requires investing 100 work years).
loan agreements

Worth Bookmarking

For vehicle owners, it takes less than 10 minutes to drive to the business hub and vibrant Orchard Road shopping district, via Pan Island Expressway (PIE) and Central Expressway (AYE).
Sant Ritz

Great

Its important to make people feel comfort near us. We are need other people in our life. We cant live alone in this life. That is why we need to make a good relation to all the people around us.
football live stream

On the need for the code to

On the need for the code to be available for examination, I definitely agree, as you can see from these sources:

http://www.isgtw.org/feed-item/astrophysics-source-code-library

http://www.astrobetter.com/code-in-the-astrophysics-code-source-library-...

http://arxiv.org/abs/1202.1026

www.ascl.net

Alice Allen
Editor, ASCL

What if the hypothesis you

What if the hypothesis you are discarding itself is incorrect? if coding is error prone, can thinking not be? hence i believe, the data if it does not fit must be re-examined by proponent of other theorists (apart from particle theorists).

nice

The LOC question is particularly an issue with
the use of autogenerated code, although I tend to think this is no
different than a compiler.
Ask Mike Palumbo

Computing helps speed up the human decision process

It is because of all those reasons computing helps speed up the human decision process, not to take over the human decision process.
Compute based results offer an "educated" guess of where the solution lies. But you cannot ever rely blindly on the results. That is why you still keep using prototypes or experience. And if prototypes cannot be done which will incur into measurement errors and phenomena not thoroughly understood, then it is still a human decision with risks.
It would be interesting to model the solution and risk to help reduce further the risks associated with human decisions.

Post new comment

By submitting this form, you accept the Mollom privacy policy.