aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPetter Reinholdtsen <pere@hungry.com>2015-01-18 07:50:52 +0100
committerPetter Reinholdtsen <pere@hungry.com>2015-01-18 07:52:12 +0100
commit93a955e58364d7e4ff4046bb52820ebcd2f290f5 (patch)
tree5fda9e2fde1e1a4e62109b5e7fcb2aa50db4a8bd
parent918675a4577953faae3a38cb31c8ab1549bd4fbf (diff)
More info on common fields.
-rw-r--r--README32
1 files changed, 32 insertions, 0 deletions
diff --git a/README b/README
index d26cc1c..c75dda2 100644
--- a/README
+++ b/README
@@ -7,6 +7,7 @@ https://github.com/rossjones/ScraperWikiX/blob/master/services/scriptmgr/scripts
Standalone lib https://github.com/scraperwiki/scraperwiki-python
== Running / testing scrapers ==
+
In addition to checking out the repo, the following is required to test or
run most scrapers:
@@ -15,3 +16,34 @@ scp -r 'scraper.nuug.no:/srv/scraper/postjournaler/testlib/*' .
apt-get install python-alembic python-beautifulsoup python-dateutil
cp scrapersources/postliste-python-lib scrapersources/postliste-python-lib.py
+== Common field names ==
+
+List of field names used in most scrapers. All dates uses ISO format,
+YYYY-MM-DD, "YYYY-MM-DD HH:MM" or "YYYY-MM-DD HH:MM+TZ".
+
+ * agency, name of public administration
+ * recorddate
+ * docdate, date on document
+ * docdesc, title/description of document entry
+ * doctype, type of documen entry
+ * caseyear
+ * caseseqnr
+ * casedocseq
+ * caseid
+ * casedesc
+ * recipient
+ * sender
+ * exemption
+ * journalyear
+ * journalseqnr
+ * journalid
+ * scraper, name of script used to collect data
+ * scrapedurl, URL used to collect data
+ * scrapestamputc, when the information was fetched from the URL
+
+These are doctype values, based on NOARK types.
+
+ * U - Outgoing document
+ * I - Incoming document
+ * X - Internal document
+ * N - Internal document