There are over 34,000 scholarly, peer-reviewed journals in existence today, collectively publishing some 2.5 million articles every year. It’s estimated that a single researcher, depending on their discipline, will read about 270 of them in the same time frame.
Scientists will never keep up. They’re going to miss key insights. They’re drowning in a sea of their own expertise.
Fortunately, the Allen Institute for Artificial Intelligence (AI2) tossed them a life preserver. On Friday, AI2 expanded its artificial intelligence-based search engine, Semantic Scholar, to the field of neuroscience. The launch is just another step toward AI2’s long-term vision: bringing man and machine together to advance science and save lives.
AI2 is the nation’s largest, non-profit AI research institute, which means the goal isn’t to make a buck; it’s to use cutting-edge techniques in AI to serve the common good. The Semantic Scholar search engine is the keystone project for the Seattle-based organization founded in 2014 by Microsoft visionary Paul Allen.
“We’re bringing scientific search to the 21st century here,” says AI2 CEO Oren Etzioni. “We cut through the clutter and home in on key publications and citations.”
Semantic Scholar uses data mining, natural language processing and computer vision in parallel to extract valuable information from text and images contained in millions of studies. Together, the system builds a semantic understanding of not only the information in a given study, but its relevance to the larger corpus of research.
Algorithms track how often the study is cited, whether those citations are from influential scientists, and if there’s been a recent uptick in a paper’s citations. Semantic Scholar also pulls in buzz circulating on social media to put studies into further context. For neuroscience searches, for example, Semantic Scholar sorts results based on the brain region targeted, the method used, the model organism and the cell type studied.
It’s a search tool that goes far deeper, in a more scientifically intuitive way, than what’s out there today.
“Our ability to home in on this different semantic information about neuroscience and computer science is what sets us apart,” says Etzioni.
Adding to the Library
Today, AI2’s scholarly search engine includes 10 million articles pertaining to computer science and neuroscience. But Etzioni plans to bring the entire PubMed biomedical corpus under the Semantic Search umbrella in 2017. Further, Etzioni says the AI2 team is working on building algorithms that can detect weaknesses in studies, such as p-hacking, to elevate quality studies to the top.
“Medical breakthroughs should not be hindered by the cumbersome process of searching the scientific literature,” says Etzioni.
There are several projects ongoing at AI2, and breakthrough in each will all filter into Semantic Scholar’s long-term future. AI2 researcher Peter Clark is leading a team that’s using deep learning to build a computer system that can pass middle school-level science exams, a task that requires understanding far deeper than search-and-retrieve techniques.
Down the hall, Ali Farhadi is working on building computer vision systems with contextual knowledge of what they see. Beyond object detection and pattern recognition, Farhadi has designed systems that, for example, predict what happens next if force is applied to an object in an image. The team’s imSitu project can produce a quick summary of what’s happening in an image.
“We want to channel the results of those projects into (Semantic Scholar),” says Etzioni. “We already have unique semantic capabilities, but on the 20-year horizon, it’s going to be completely unprecedented.”
In 20 years, far from a useful search engine, Semantic Scholar might serve as a science apprentice assisting researchers in their work. A machine apprentice might comb research papers and suggest avenues for future studies based on all the articles and images it analyzed. It might expose missing links, or studies relevant to a new paper, and perhaps nudge researchers down more productive avenues.
“We’ve got these teams working on the core underlying technologies for visual, sort of the training wheels, if you will,” says Etzioni.