miercuri, 4 iunie 2014

Boehmig, Hwang, and Sawaya: New legal text parsing system: The Restatement Project

Jason Boehmig , Tim Hwang , and Paul Sawaya have posted Rough Consensus, Running Standards: The Restatement Project , at VoxPopuLII .


Here are excerpts from the post:



[...] Supported by a grant from the Knight Foundation Prototype Fund, Restatement is a simple, rough-and-ready system which automatically parses legal text into a basic machine-readable JSON format. It has also been released under the permissive terms of the MIT License, to encourage active experimentation and implementation.


The concept is to develop an easily-extensible system which parses through legal text and looks for some common features to render into a standard format. Our general design principle in developing the parser was to begin with only the most simple features common to nearly all legal documents. This includes the parsing of headers, section information, and “blanks” for inputs in legal documents like contracts. As a demonstration of the potential application of Restatement, we’re also designing a viewer that takes documents rendered in the Restatement format and displays them in a simple, beautiful, web-readable version.


Underneath the hood, Restatement is all built upon web technology. [...] We want to make it easy for developers to write software that displays and modifies legal documents in the browser.


[...] we wrote the Restatement parser and viewer in JavaScript, and made the Restatement format itself a type of JSON (JavaScript Object Notation) document.


For those who are more technically inclined, we also knew that Restatement needed a parser formalism, that is, a precise way to define how plain text can get transformed into Restatement format. We became interested in recent advance in parsing technology, called PEG (Parsing Expression Grammar).


PEG parsers are different from other types of parsers; they’re unambiguous. That means that plain text passing through a PEG parser has only one possible valid parsed output. We became excited about using the deterministic property of PEG to mix parsing rules and code, and that’s when we found peg.js.


With peg.js, we can generate a grammar that executes JavaScript code as it parses your document. This hybrid approach is super powerful. It allows us to have all of the advantages of using a parser formalism (like speed and unambiguity) while also allowing us to run custom JavaScript code on each bit of your document as it parses. That way we can use an external library, like the Sunlight Foundation’s fantastic citation, from inside the parser.


Our next step is to prototype an “interactive parser,” a tool for attorneys to define the structure of their documents and see how they parse. Behind the scenes, this interactive parser will generate peg.js programs and run them against plaintext [...]


Restatement is going fully operational in June 2014. After launch, the two remaining challenges are to (a) continuing expanding the range of legal document features the parser will be able to successfully process, and (b) begin widely processing legal documents into the Restatement format.


For the first, we’re encouraging a community of legal technologists to play around with Restatement, break it as much as possible, and give us feedback. Running Restatement against a host of different legal documents and seeing where it fails will expose the areas that are necessary to bolster the parser to expand its potential applicability as far as possible.


For the second, Restatement will be rendering popular legal documents in the format, and partnering with platforms to integrate Restatement into the legal content they produce. We’re excited to say on launch Restatement will be releasing the standard form documents used by the startup accelerator Y Combinator, and Series Seed, an open source project around seed financing created by Fenwick & West.


[...] the Restatement team is always looking for collaborators. If what’s been described here interests you, please drop us a line! I’m available at tim@robotandhwang.org, and on Twitter @RobotandHwang.



For more details, please see the complete post.


HT @RobotandHwang




Filed under: Applications, Others' scholarly or sophisticated blogposts, Projects, Software, Technology developments, Technology tools Tagged: Jason Boehmig, Legal text parsers, Parsers, Parsers for legal text processing, Paul Sawaya, peg.js, Restatement, Restatement Project, Tim Hwang



via Legal Informatics Blog http://ift.tt/1hZhjYI

Niciun comentariu:

Trimiteți un comentariu