luni, 28 octombrie 2013

Whitby: UK Hansard Archive Bulk Download URL File

Dr. Andrew Whitby of Nesta has posted UK Hansard Archive Bulk Download URL File (or When is Open Data Not) .


Here are excerpts:



I am currently working on a project that involves large scale analysis of various countries’ Hansards (this is, transcripts of parliamentary debate). [...]


The UK Parliament has such a digitised archive, here.


Frustratingly though, although these zipped XML files are available, there is no bulk download option or simple FTP archive of them. [...]


So, to save anyone else the pain, here is a link to a file I built that contains links to every file in this archive. I used the handy FormRequest feature of Scrapy, my favourite, heavily used, scraping tool.


https://github.com/econandrew/uk-hansard-archive-urls/blob/master/urls.txt [...]



For more details, please see the complete post.


HT @owenboswarva




Filed under: Applications, Data sets, Technology developments, Technology tools Tagged: Andrew Whitby, Hansard Archive, Legal descriptive metadata, Legal identifiers, Legal metadata, Open legislative data, Parliamentary data, URLs for Hansard Archive, URLs for UK Hansard Archive, URLs for UK Hansards



via Legal Informatics Blog http://legalinformatics.wordpress.com/2013/10/28/whitby-uk-hansard-archive-bulk-download-url-file/

Niciun comentariu:

Trimiteți un comentariu