Across the spectrum, data has gotten big.
"If you look at the trend, databases are getting bigger and bigger," said Dora Cai, a database architect based at the US National Center for Supercomputing Applications. While 50 gigabytes would have been considered a large database not that long ago, "now we're talking about terabytes and hundreds of terabytes and even petabytes."
The Virtual Worlds Exploratorium and an ongoing census analysis project are two examples of data-intensive research in the humanities that show how NCSA's infrastructure and staff can help researchers address the challenges of big data.
Millions of people around the world play massively multi-player online role-playing games. And as they play, their every action—each time they fight a dragon, buy or sell armor, talk to another player—is logged by the game, creating a wealth of information about how people interact in these "virtual worlds."
Several years ago, Sony approached researcher Dmitri Williams, then at the University of Illinois and now at the University of Southern California, to see if he could use data gathered from EverQuest II to determine which players were likely to leave the game (and therefore stop paying to play). Williams was also interested in questions about whether in-game behavior correlated with behavior in the real world. Will someone with a violent, aggressive game character be more violent or aggressive in the real world, for example?
Williams teamed with co-principal investigators Marshall Scott Poole (University of Illinois), Noshir Contractor (Northwestern University), and computer scientist Jaideep Srivastava (University of Minnesota) to investigate a massive collection of game log data from EverQuest II and other games—Dragon's Nest and Chevalier's Romance, which are popular in China, and Denmark-based EVE Online. They call their collaboration the Virtual Worlds Exploratorium.
The researchers faced several challenges in working with these data:
The data is housed at NCSA because, "they have a lot of experience with large data and with making sure the data is securely handled," Contractor said. And the VWE team tapped Cai to create an organized database from the "messy" collection of log files.
If you aren't a data-focused researcher or computer scientist, you might miss the significance of that crucial step, but a collection of data isn't a useful database until it has been organized and structured and can be queried. That's where Cai's expertise in database came into play.
"We have all these great data, and we can ask loads of questions about interaction in the space," Contractor says.
Some of those questions have addressed group leadership, while others have addressed group formation: Why do people team up with one another in the game? Do groups form based on similarities, complimentary differences, proximity, etc.? Leadership is one of the areas of interest to the US Army and Air Force, which have both provided funding for VWE projects.
"This might be the best training ground for the kinds of leaders we will see tomorrow," Contractor said.
The researchers have also studied "illegal" transactions in which players sell currency, items, and even high-level characters to wealthier players who want the perks without putting in hours of game play to earn them. As games try to crack down on this behavior, the illicit sellers and buyers adopt new tricks to conceal their actions. One of Contractor's students, Brian Keegan, along with fellow student Muhammad Ahmad, found that the "illegal" networks in the game employ virtually identical strategies to those used by drug traffickers. The researchers also found that people who engage in illegal conduct in the game are more likely to have real-world criminal records.
A treasure trove of US Census data is released to the public after remaining confidential for 70 years. The standard practice has been for the Census Bureau to create microfilm images of the millions of paper forms. Companies that cater to genealogy buffs, like Ancestry.com, then hire thousands of people to spend months transcribing the microfilm so the data can be searched and sorted online.
But this April the detailed information on the more than 132 million people who lived in the United States in 1940 will be released in digital format. No more microfilm.
The Census Bureau would like to provide something more usable than 3.8 million JPEG images of census forms, but manual transcription is too expensive, and optical character recognition of the handwritten entries is not accurate enough. So NCSA's Image, Spatial, and Data Analysis group, led by Kenton McHenry, has been working for the past year on a prototype framework using content-based image retrieval to allow people to search the census form images directly. The project is supported by the National Archives and Records Administration.
The framework enables a user to input a handwritten query—either using a stylus or by typing a word that will be then rendered in a handwriting font—to search a database of images of handwritten text for potential matches. Using a computer vision technique known as word spotting, the top ranked results are returned.
While not all will be perfect matches, the system's users will help improve the results over time through a passive form of crowd sourcing. For instance, after searching for "Smith" a user isn't likely to click on results that are not "Smith." The query text entered by the user can be connected to the image results the user selected, allowing the image database to be slowly annotated. Over time, the validated matches can be returned to users rather than relying solely on the word spotting technique.
A significant amount of computation is required in order to pre-process the data to allow for the planned word spotting and passive crowd sourcing. The first step is to split the spreadsheet-like Census forms into individual data cells by finding the form lines and fitting a template over the images. Next, each extracted cell must be converted into a numerical feature vector that roughly represents the handwritten contents of that image. A word spotting technique compares the feature vector of the search query (such as a name, like Smith) to the feature vectors of the many, many cells, looking for similarities. To search all 70 billion cell images would be excessively time-consuming and computationally expensive, so a third step groups similar feature vectors and constructs a hierarchy on the data to narrow the search space and return results with reasonable speed.
The team is using an XSEDE start-up allocation to develop their system. An XSEDE Extended Collaborative Support Services team, led by NCSA's Jay Alameda, has helped the group get optimal performance out of their code, assisting with mapping processes to hardware and with I/O issues. The team has applied through XSEDE for 2 million CPU hours to be used to process the 1940 census records.
A version of this story first appeared on the NCSA website.
Comments
"The evidence, while still
"The evidence, while still accumulating, is strong enough to support a conclusion… that there could be some risk," wrote Dr. Jonathan Samet, chairman of the WHO's working group on the subject. "Therefore, we need to keep a close watch for a link between cell phones and cancer risk."
According to Yilmaz, one Kitchen Set Murah be careful about interpreting these kinds of data.
"Epidemiological studies are limited by many different types of biases and flaws and necessarily trail the technology," Yilmaz said. "They can discover correlation but not causation."
In general, a strong Kitchen Set Murah between an environmental agent and an observed effect indicates that the first is likely causing the second. However, when there are no biologically plausible mechanisms by which exposure to the agent could cause the observed outcome, the correlation must be Cetak Yasin strong for scientists to reach that conclusion.
word spotting technique
"While not all will be perfect matches, the system's users will help improve the results over time through a passive form of crowd sourcing. For instance, after searching for "Smith" a user isn't likely to click on results that are not "Smith." The query text entered by the user can be connected to the image results the user selected, allowing the image database to be slowly annotated. Over time, the validated matches can be returned to users rather than relying solely on the word spotting technique."
Nice post, good to know that this Developer Conference topic is being covered also in this web site. love this post Ultrabook Terbaru and also Konsumen Cerdas Paham Perlindungan Konsumen - Iconia PC Tablet dengan Windows 8 - Mau Bikin Website + Hosting Murah AbizZ? Ke Rajawebhost.com aja! - Cipto Junaedy
that is exactly what it
that is exactly what it should contain. Nothing less... nothing more.
TV aerials Hampstead
Multitudes?
It seem complicated. But finally I understand it after I read your post jual lingerie
Sebagai mobil menjadi populer
Sebagai mobil menjadi populer dan lebih terjangkau setelah Perang Dunia I, drive-in restoran diperkenalkan. Perusahaan Amerika White Castle, didirikan oleh Billy Ingram dan Walter Anderson di Wichita, Kansas pada tahun 1921, biasanya dikreditkan dengan membuka kedua outlet makanan cepat saji dan rantai here hamburger pertama, menjual hamburger selama lima sen [8] Walter Anderson telah membangun. pertama White Castle restoran di Wichita pada tahun 1916, memperkenalkan menu yang terbatas, volume tinggi, biaya rendah, kecepatan tinggi restoran hamburger [9] Di antara inovasi, perusahaan mengijinkan pelanggan untuk melihat makanan sedang kursus bahasa inggris dipersiapkan.. White Castle berhasil dari here awal dan melahirkan berbagai pesaing.
Waralaba diperkenalkan pada tahun 1921 oleh belajar bahasa inggris A & W Root tutorial teknisi komputer Beer, yang waralaba sirup khas. Howard Johnson pertama di waralaba konsep restoran pada pertengahan 1930-an, secara resmi standardisasi menu, soal ulangan sd signage dan iklan. [9]
Layanan pinggir jalan diperkenalkan soal ulangan sd pada akhir tahun 1920 dan dimobilisasi pada 1940-an ketika carhops terikat pada sepatu roda. [10]
Amerika Serikat memiliki industri makanan here cepat saji terbesar di dunia, dan Amerika restoran makanan cepat saji yang berlokasi di lebih dari 100 negara. Sekitar 2 juta pekerja AS bekerja di bidang persiapan here makanan dan pelayanan makanan termasuk makanan cepat saji di Amerika Serikat. [11]
The team is using an XSEDE
The team is using an XSEDE start-up allocation to develop their system. An XSEDE Extended Collaborative Support Services team, led by NCSA's Jay Alameda, has helped the group get optimal performance out of their code, assisting with mapping processes to hardware and with I/O issues. The team has applied through XSEDE for 2 million CPU hours to be used to process the 1940 census records. Iconia PC tablet dengan Windows 8
You might post on the brand
You mightpost on the brand for the blog. You should exhibit it's potent. Your blog estimation shall swell up your viewers.windows 7 professional product key
Fantastic goods from you,
Fantastic goods from you, manI've understand your stuff previous to and you're just too fantasticI actually like what you have acquired here, certainly like what you're saying and the way in which you say itYou make it enjoyable and you still care for to keep it smartI cant wait to read far more from youThis is really a great website.
The other info Sekolah Belajar Forex FBS Indonesia & don't forget read good news Konsumen Cerdas Paham Perlindungan Konsumen, & don't forget read good news ESER Unlimited Power Bank & best Cipto Junaedy dan juga Cipto Junaedy & best news Iconia PC tablet dengan Windows 8 and also nulis that interesting. Thank you
Census records
Census records are the lifeblood of family history [Searchii] - it's hard to imagine getting started without them. yaacaa
Your thinking is going broad
Your thinking is going broad day by day. Good to see that. Click Here
Over time, the validated
Over time, the validated matches can be returned to users rather than relying solely on the word spotting technique of Lowongan Kerja Terbaru 2013.
Do groups form based on
Do groups form based on similarities, complimentary differences, proximity, etc.? Hangover Symptoms
The first step is to split
The first step is to split the spreadsheet-like Census forms into individual data cells by finding the form lines and fitting a template over the images hot careers.
I had a great time reading
I had a great time reading your article and I found it interesting. This is such a beautiful topic that me and my friends are talking about. Sinema
thanks for that
I like it and shared it on facebook ;-)
Bundesliga Live und Fussball Live, die neuesten und fesselndsten Ergebnisse Live.
Auf dieser interessanten Internetseite gibt es innovative und packende News rund um den Inhalt Bundesliga live. Ferner gibt es die informativsten Meldungen zu Bundesliga Live und Fussball Live, Primera Division, Europa League, Super Lig, aber auch zu weiteren interessanten Bundesliga Live Themen. Doch das ganz und gar Wichtigste der Webpräsenz: Fussball live und Ergebnisse live. Neben diesen Nachrichteninhalten bieten Berichte und Background Infos über die Besten Fussballer wie Diego Maradona,Ferenc Puskas,Michel Platini und George Best reichlich Unterhaltung. Zudem findet man Informationen über RTL Livestream, Ergebnisse live, live tv und livescore.
THAnks for a share some your knowledge
big thanks for share some new knowledge to me
konsumen cerdas paham perlindungan konsumen
cibadak
sukabumi
ahappydeal
Drop mobile phones for sale and Cell Phones from .. provides more Mobile Phone products with preferential prices and quality services.
Several years ago, Sony
Several years ago, Sony approached researcher Dmitri Williams, then at the University of Illinois and now at the University of Southern California EasyLife application
Nice post
Wonderful post. I am searching awesome news and idea. What I have found from your site, it is actually highly content. You have spent long time for this post. It's a very useful and interesting site. Thanks! Produk Kecantikan Kado Unik Cetak Yasin
Very good article from the
Very good article from the article you increase the value of infirmation has brought me graet help.
250friv
While not all will be perfect
While not all will be perfect matches, the system's 199-01 exam users will help improve the results over time through a passive form of crowd sourcing. For instance, after searching for "Smith" a user isn't 199-01 practice test likely to click on results that are not "Smith." The query text entered by the user can be connected to the image results the user selected, allowing the image database to be slowly annotated. Over time, the validated 220-802 exam matches can be returned to users rather than relying solely 220-802 practice test on the word spotting technique.
This is a really good read and extremely helpful for me.
This is a really good read and extremely helpful for me. If possible, as you gain expertise, would you mind updating your blog with more information? capture his heart and make him love you forever
If you aren't a data-focused
If you aren't a data-focused CCNP braindumps researcher or computer scientist, you might miss the significance of that crucial step, but a collection of data isn't CCNP dumps useful database until it has been organized and structured CCNP exams and can be queried. That's where Cai's expertise in database came into play. http://www.testinsides.com/Cisco-CCNP.html
re
Very good article from the article you increase the value of infirmation has brought me graet help.
donne nude
Over time, the validated
Over time, the validated matches can be returned to users rather than relying solely on the word spotting technique lowongan kerja terbaru 2013.
Will someone with a violent,
Will someone with a violent, aggressive game character be more violent or aggressive in the real world best blackberry phone.
This might be the best
This might be the best training ground for the kinds of leaders we will see tomorrow, top rock songs.
Using a computer vision
Using a computer vision technique known as word spotting, the top ranked results are returned tablet android.
Hi
I like this. Containing such multitudes isn't easy and is quite complicated.
Web Development Services
Good article
Indeed, a very enlightening post that makes people aware of what is happening right now in the world.
The framework enables a
The framework enables a passguide 642-813 user to input a handwritten query—either using a stylus or by typing a word that will be then rendered in a handwriting font—to http://www.passguides.com/642-813.html search a database of images of handwritten passguide 642-902 text for potential matches. Using a computer vision technique known as word spotting, http://www.passguides.com/642-902.html the top ranked results are returned.
Blah blah
I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.
Bank Mandiri Bank Terbaik di Indonesia
Bank Mandiri Bank Terbaik di Indonesia
Bank Mandiri Bank Terbaik di Indonesia
Bank Mandiri Bank Terbaik di Indonesia
It’s hard to find
It’s hard to find knowledgeable people on this topic, but you sound like you know what you’re talking about! best stylus for ipad
Several years ago, katalog
Several years ago, katalog stron Sony approached researcher Dmitri Williams, then at the University of Illinois and now at the University of zdrowie i uroda Southern California, to see if he could use data gathered from EverQuest II to determine which players were likely to leave the game (and therefore stop produkcja przemysłowa
paying to play). Williams was also interested prawo i społeczeństwo in questions about marketing whether in-game behavior correlated with behavior in the real world. Will someone with a violent, aggressive game character be more violent or aggressive in the real world, for example?
kultura i sztuka
This really answered my
This really answered my problem, Lowongan Kerja!
I guess that it is a good
I guess that it is a good alternative.
website
read about it here for more
read about it here for more
Bankruptcy attorney Brooklyn
I think who visit your site could not stop reading your post. Because i seen the content of you.
Bankruptcy attorney Brooklyn
Invormativ posting article,
Invormativ posting article, thanks for this invormation The team has applied through XSEDE for 2 million CPU hours to be used to process the 1940 census records. gratis spiele spielen
Post new comment