The stereotype of a horn-rim bespectacled, blue-haired librarian shushing you while she returns dusty tomes to their shelves no longer holds in 2015. In the digital age, librarians manage massive digitized collections of images and videos. Maria Esteva, data curator for the Texas Advanced Computing Center (TACC), works to find a supercomputing solution to efficiently assess the quality of these outsized video archives.
In addition to movies and television programs, video is also used to maintain anthropological records of societies and cultures; works of art, music and theatre performances; rituals, languages, oral histories; and all manner of human achievement. Increasingly, video is also the medium for criminal evidence, court depositions, surveillance, and demonstrating scientific results, all of which complicates a librarian’s ability to track collections.
To make sound curatorial decisions, to effectively employ public funds, and to manage and grant public access, librarians must know what’s in their burgeoning digital archives. “Curating involves understanding the quality of assets, and manual review is impossible at the scale of growth we’re experiencing in video production today,” says Esteva. “If you think about long-term preservation, we need efficient methods to make decisions about how to preserve this heritage. Understanding the quality of the collections taxpayers charge us to steward is an important first step.”
To clear this hurdle, Esteva collaborates with Alan Bovik, director of the Laboratory for Image and Video Engineering (LIVE) at the University of Texas at Austin (UT). To recognize video quality without spending countless hours sifting through videos one by one, she looked to the perception-based video quality assessment algorithms Bovik has honed. She had tried one of his algorithms before, so she hoped he could help her curate her video collection more efficiently.
“It's one thing to measure the quality of something two teenagers are sending to each other and will be gone an hour later,” Bovik notes. “It's another to maintain the quality of images of art that you’d like to maintain over centuries and broadcast to large numbers of people.”
Employed by the cable and satellite television industries, image and video quality assessment algorithms come in two flavors: Full reference (FR) and no reference (NR). FR algorithms have an original video against which they compare compressed digitized videos. NR algorithms do not, and assess the qualities of the videos without the pre-existing reference.
“These algorithms measure the quality of an image or a video in a way that corresponds to human perception of that quality,” says Bovik. To meet this standard, Bovik has been training algorithms using neuroscience models of how humans process visual signals, beginning at the eye and proceeding into the visual brain.
Along with Todd Goodall from LIVE and Zach Abel from the College of Natural Sciences at UT, Esteva and Bovik employed a NR algorithm called Blind Reference-less Image Spatial Quality Evaluator (BRISQUE) in their recent study. “Even though these algorithms are efficient,” says Bovik, “they're still processing huge amounts of data in the terabyte range, so you need really fast processors, which means supercomputers.”
Esteva has sought help from the Extreme Science and Engineering Discovery Environment (XSEDE), which provides resources and consultation to scientists nationwide. To assess the performance of these algorithms at collection level, the team used the TACC supercomputer Stampede, optimized to run at eight cores per node. Each compute node worked on two videos at once and analyzed around 40 high definition frames per second.
The team designed what they call a ‘Curation En Masse’ workflow. In addition to predicting video quality via the BRISQUE algorithm, the workflow includes tasks like extracting metadata and frames out of the video, producing diagnostic visualization tools for closer assessment, and plotting scores into graphics.
The BRISQUE algorithm has proved to be very good at assessing when videos were very low quality or very high quality. However, further refinement of the algorithm is needed for dealing with video distortions that are intentional artistic choices. Esteva and Bovik expect this algorithm to accurately assess the relative quality of individual videos or across collections that comply with certain characteristics.
Ultimately, the receiver of these video cultural assets is the public, Bovik stresses. Preserving these digital images for posterity means increased access for citizens who want to see what’s new in video art at the Tate Gallery in London, but do not have the ability to travel to the museum.
“If we have to preserve these digitized images and videos for evidential purpose, for legal purpose, for cultural purpose, we need to know what we have and understand its quality so we can care for it and deliver it in the same authenticity as it was recorded,” Esteva concludes. “At this point, we don't have efficient methods to do that. That's why we use supercomputers, that's why we use algorithms.”