Eric O. Scott , Haleh Vafaie , Zelal Gungordu , Charles E. Horowitz , and Bradford C. Brown are scheduled to present a paper entitled Text Mining for Quality Control of Court Records, at SemADoc 2014: Semantic Analysis of Documents Workshop, to be held 16 September 2014 in Fort Collins, Colorado, USA. The workshop is being held in conjunction with the DocEng 2014 conference.
Here is the abstract, from the event program:
Attorneys across the United States use government-provided electronic databases to submit docket entries and associated case files for processing and archival in public judicial records. Data entry errors in these repositories, while rare, can disrupt the court process, confuse the public record, or breach privacy and confidentiality. Docket quality assurance is thus a high priority for the courts, but manual review remains resource-intensive.
We have developed a prototype application of text mining and human language technologies to partially automate quality assurance review of electronic court documents. This solution uses document classification and named entity recognition to extract metadata directly from documents. Discrepancies between the extracted metadata and the user-provided metadata indicate a possible data entry error. On two independent samples of publicly available court documents, we find that for a small number of classes with a sufficient number of training documents, the document class can be automatically classified with greater than 94\% accuracy in one case, but only 81\% in the other. Our attempts to extract case numbers and the names of parties from documents via a conditional random field model met with less success. Future work with more extensive training data is necessary to more accurately evaluate both applications.
Filed under: Applications, Articles and papers, Conference papers, Technology developments, Technology tools Tagged: Automatic classification of legal documents, Automatic legal information extraction, Bradford C. Brown, Charles E. Horowitz, Conditional random fields in legal text processing, Court information systems, DocEng, DocEng 2014, Eric O. Scott, Haleh Vafaie, Judicial information systems, Legal document classification, Legal information extraction, Legal metadata extraction, Legal text mining, Legal text processing, Machine learning and legal documents, MITRE Corporation, Named entity recognition in legal documents, Quality control of court documents, Quality control of court records, Quality control of digital court documents, Quality control of digital court records, Quality control of electronic court records, Quality control of electronic legal documents, Quality control of electronic legal information, Quality control of legal documents, SemADoc, SemADoc 2014, Semantic Analysis of Documents Workshop, Zelal Gungordu
via Legal Informatics Blog http://ift.tt/1ueriA8