Information extraction (IE) is the task of
automatically extracting structured information
from unstructured and semi-structured machinereadable
document. In this paper, we propose a new paradigm
for information extraction. In this extraction framework,
intermediate output of each text processing component is
stored so that only the improved component has to be
deployed to the entire corpus. Extraction is then performed on
both the previously processed data from the unchanged
components as well as the updated data generated by the
improved component. Performing such kind of incremental
extraction can result in a tremendous reduction of processing
time. To realize this new information extraction framework,
we propose to choose database management systems over filebased
storage systems to address the dynamic extraction
needs. To demonstrate the feasibility of incremental extraction
approach, experiments are performed to highlight two
important aspects of an information extraction system:
efficiency and quality of extraction results.
Real Time Impact Factor:
Pending
Author Name: K Venkatesh, B Vijaya Bhaskar Reddy
URL: View PDF
Keywords: Text mining, query languages, information storage and retrieval
ISSN: 2347-5552
EISSN: 2347-5552
EOI/DOI:
Add Citation
Views: 4011