Wednesday, August 8, 2007

New search engine for tables is developed

U.S. computer scientists have created a search engine that can identify and extract tables from PDF documents, as well as index and rank the results.


The search engine -- called TableSeer -- developed by Pennsylvania State University researchers has an innovative ranking algorithm that also can identify tables found in frequently cited documents and weigh that factor as well in the search results, Assistant Professor Prasenjit Mitra said.


Mitra said TableSeer is believed to be the first search engine designed for tables.


Although some software can identify and extract tables from text, existing software cannot search for tables across documents, Mitra said. TableSeer automates that process, capturing data not only within the table, but also in tables' titles and footnotes. In addition, it enables column-name-based searches so a user can search for a particular column in a table.


The development of TableSeer is part of an open-source project funded by the National Science Foundation.


TableSeer can be tested online at http://chemxseer.ist.psu.edu. The source code will be made available near the completion of the project, the researchers said.


Copyright 2007 by United Press International. All Rights Reserved.


 

1 comment:

Anonymous said...

Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the Pen Drive, I hope you enjoy. The address is http://pen-drive-brasil.blogspot.com. A hug.

Google
 
Zeus Internet Marketing Robot
NNXH
Cyber-Robotics - ZEUS INTERNET MARKETING ROBOT