miercuri, 17 septembrie 2014

Kuhlman: Legal Synonyms Project, and word cloud of U.S. Code

Casey Kuhlman of the U.S. Open Data Institute has posted a word cloud of the U.S. Code .


Casey says the data are “actual word counts piped into a JQuery lib,” and that he is also working on “N grams and POS tags” for the U.S. Code.


This visualization is an outcome of his Legal Synonyms Project. (HT @benbalter). Here is a description of that project, from the readme:



A synonym.txt for Solr Instances. Solr is a great search engine but it is even better with a bit of training. One of the most used ways to train Solr is to add a synonyms.txt file. Building a synonyms.txt file for a particular corpus of language is not an easy exercise. This repository is an attempt to build a synonyms.txt file for a legal corpus so that Solr can be used to search a corpus of documents of a legal nature.


The results of this effort rather than being strictly and traditionally versioned are contained in different synonyms.txt files. [...]



It will be interesting to compare this n-gram application to Daniel Martin Katz, Michael Bommarito, and colleagues’ Legal Language Explorer , which displays n-gram data for U.S. federal court decisions.


For more details, please see the Legal Synonyms Project repository.


HT @compleatang




Filed under: Applications Tagged: Apache Solr, Casey Kuhlman, Computational linguistics and law, Corpus linguistics and law, Daniel Martin Katz, Legal computational linguistics, Legal corpus linguistics, Legal Language Explorer, Legal N-Grams, Legal Synonym Project, Legal synonyms, Legal text analysis, Legal thesauri, Legal word clouds Legislative word clouds, Legislative N-Grams, Michael Bommarito, Solr and law, Solr and legal information systems, Visualization of legal information, Visualization of legislative information, Word clouds of legislation



via Legal Informatics Blog http://ift.tt/1tcBHhw

Niciun comentariu:

Trimiteți un comentariu