Casey Kuhlman of the U.S. Open Data Institute has posted a word cloud of the U.S. Code .
Casey says the data are “actual word counts piped into a JQuery lib,” and that he is also working on “N grams and POS tags” for the U.S. Code.
This visualization is an outcome of his Legal Synonyms Project. (HT @benbalter). Here is a description of that project, from the readme:
A synonym.txt for Solr Instances. Solr is a great search engine but it is even better with a bit of training. One of the most used ways to train Solr is to add a synonyms.txt file. Building a synonyms.txt file for a particular corpus of language is not an easy exercise. This repository is an attempt to build a synonyms.txt file for a legal corpus so that Solr can be used to search a corpus of documents of a legal nature.
The results of this effort rather than being strictly and traditionally versioned are contained in different synonyms.txt files. [...]
It will be interesting to compare this n-gram application to Daniel Martin Katz, Michael Bommarito, and colleagues’ Legal Language Explorer , which displays n-gram data for U.S. federal court decisions.
For more details, please see the Legal Synonyms Project repository.
HT @compleatang
Filed under: Applications Tagged: Apache Solr, Casey Kuhlman, Computational linguistics and law, Corpus linguistics and law, Daniel Martin Katz, Legal computational linguistics, Legal corpus linguistics, Legal Language Explorer, Legal N-Grams, Legal Synonym Project, Legal synonyms, Legal text analysis, Legal thesauri, Legal word clouds Legislative word clouds, Legislative N-Grams, Michael Bommarito, Solr and law, Solr and legal information systems, Visualization of legal information, Visualization of legislative information, Word clouds of legislation
via Legal Informatics Blog http://ift.tt/1tcBHhw
Niciun comentariu:
Trimiteți un comentariu