Archive and make discoverable data and links with schema.org metadata.
dcat --help
Run
dcat adduser
and follow the prompting wizard.
dcat
allows the publication of JSON-LD
documents using dcat.io context. This context
extends schema.org with terms relevant to do I/O
and preserve data integrity (like filePath
and Checksum
).
At the minimum, a document has to contain
- a context (
@context
) set to https://dcat.io, - an id (
@id
) to uniquely identify things published on dcat.io with URLs. All relative URLs will be resolved with a base (defined in the context (@base
)) ofhttps://dcat.io
e.g.
{
"@context": "https://dcat.io",
"@id": "mydoc"
}
To publish this document (mydoc
), create a file named JSONLD
and in the directory containing JSONLD
run:
dcat publish
After publication the document will be available at https://dcat.io/mydoc
.
Documents can contain any properties from
schema.org or from any other ontologies as long
as the associated @context
are provided.
If a version
property is specified
in the document, the document will be versioned, that is, each update
will require a new version value in order to be published (this prevents
existing versions from being overwritten).
When appropriate version number SHOULD follow semantic versioning
e.g.
{
"@context": "https://dcat.io",
"@id": "mydoc",
"version": "0.0.1"
}
After publication this document will be available at
https://dcat.io/mydoc?version=0.0.1
whereas the latest version
will always be available at https://dcat.io/mydoc
.
In case the document is versioned following
Semantic Versioning, a range (e.g. <0.0.1
)
can be specified as version
(e.g. https://dcat.io/mydoc?version=<0.0.1
)
Document can be arbitrarily complex (having multiple nodes) and
sometimes, it makes sense to assign a URL to a node so that
it can be referenced. This is achieved by setting @id
properties
to the desired nodes
e.g.
{
"@context": "https://dcat.io",
"@id": "mydoc",
"version": "0.0.1",
"hasPart": {
"@id": "mydoc/data",
"@type": "Dataset",
"description": "a dataset part of the document"
}
}
The whole document can be retrieved at https://dcat.io/mydoc
whereas the part (node) can be retrieved at https://dcat.io/mydoc/data
Note: nodes can be any valid URLs but they have to be namespaced
within the top level @id
(for a document of ""@id": "mydoc""
, "@id": "mydoc/arbitrarily/long/pathname"
will be
valid whereas "@id": "part"
won't).
dcat
can be used to add machine readable metadata to any
resources already published on the web.
For instance running:
dcat init https://github.com/standard-analytics/dcat.git
we get a basic machine readable document:
{
"@context": "https://dcat.io",
"@id": "mydoc",
"@type": "Code",
"codeRepository": "https://github.com/standard-analytics/dcat",
"encoding": {
"@type": "MediaObject",
"contentUrl": "https://api.github.com/repos/standard-analytics/dcat/tarball/master",
"encodingFormat": "application/x-gzip",
"contentSize": 690980
}
}
This document should be extended with more properties (from schema.org such as author, contributor, about, programmingLanguage, runtime..., or from any other web ontologies, taking care to add contexts in this case) to improve the discoverability and reusability of the resource.
Note, in addition to absolute URLs, dcat
supports
CURIE for the prefixes defined in the
dcat.io @context
. Using a CURIE, the previous is simplified to:
dcat init github:standard-analytics/dcat.git
For all the subclasses of
schema.org/CreativeWork (e.g
Dataset, Code,
SoftwareApplication,
Article, Book,
ImageObject,
VideoObject,
AudioObject, ...) dcat
allows
the publication of raw data from files (including datasets, binaries, images, media, and more...)
along with documents.
For instance if you have an a PDF of a MedicalScholarlyArticle and an associated Dataset in CSV you can run:
dcat init --main article.pdf::MedicalScholarlyArticle --part data.csv
Note: ::MedicalScholarlyArticle
associates a type
(@type
) with the resource (article.pdf
).
This will generate a machine readable document (JSONLD) that you can edit to provide additional metadata.
{
"@context": "https://dcat.io",
"@id": "mydoc",
"@type": "MedicalScholarlyArticle",
"encoding": {
"@type": "MediaObject",
"filePath": "article.pdf"
},
"hasPart": {
"@type": "Dataset",
"distribution": {
"@type": "DataDownload",
"filePath": "data.csv"
}
}
}
After publication (dcat publish
) the document will acquire
additional URL properties that can be dereferenced to retrieved the
original raw data:
{
"@context": "https://dcat.io",
"@id": "mydoc",
"@type": "MedicalScholarlyArticle",
"encoding": {
"@type": "MediaObject",
"filePath": "article.pdf",
"contentUrl": "http://example.com/article.pdf" //generated URL
},
"hasPart": {
"@type": "Dataset",
"distribution": {
"@type": "DataDownload",
"filePath": "data.csv",
"contentUrl": "http://example.com/data.csv" //generated URL
}
}
}
Note: dcat init
supports globbing so you can run commands like:
dcat init --main article.pdf --part *.csv
or repeat --part
(or the shorter -p
) if you need more complex matching e.g.
dcat init --m article.pdf -p *.csv -p *.jpg
Directories are published as tarballs. For instance, running
dcat init -m src::Code --id cproject
where src
is a directory of source files
src
├── lib.h
└── main.c
will generate:
{
"@context": "https://dcat.io",
"@id": "cproject",
"@type": "Code",
"programmingLanguage": { "name": "c" },
"encoding": {
"@type": "MediaObject",
"encodingFormat": "application/x-gtar",
"hasPart": [
{ "@type": "MediaObject", "filePath": "src/lib.h" },
{ "@type": "MediaObject", "filePath": "src/main.c" }
]
}
}
After publication, the MediaObject will have a
contentUrl
property
indicating where the tarball can be retrieved.
To delete a specific version of a document of "@id": "mydoc"
run:
dcat unpublish ldr:mydoc?version=0.1.1
ldr
is the prefix used for https://dcat.io
(defined in the
dcat.io @context
).
To delete all versions of a document of "@id": "mydoc"
run:
dcat unpublish ldr:mydoc
Document containing keywords,
name or
description properties can be
searched by keyword with dcat search
followed by a list of
keywords.
For more powerful search, all data published on dcat.io are valid linked data fragments and can be queried using SPARQL.
dcat show
followed by a CURIE
displays the latest
JSON-LD document corresponding to the CURIE on
stdout.
Different options (-e, --expand
, -f, --flatten
, -c, --compact
, -n, --normalize
) provide alternative
representations of the document. For instance,
dcat show ldr:mydoc?version=<2.1.0 --normalize
will serialize the latest version smaller than 2.1.0 of the document
of "@id": "mydoc"
to N-Quads
(RDF).
dcat clone
followed by a CURIE downloads the raw data associated with a document and stores them along with the document on
disk at the paths specified by the filePath
properties.
Only maintainers of a document can publish or remove versions of a document. Maintainers of a document can be listed with:
dcat maintainer ls <CURIE>
Maintainers can give users maintainer rights by running:
dcat maintainer add <user CURIE> <doc CURIE>
Note: all user of dcat.io have a CURIE of ldr:users/{username}
Maintainers can remove maintainer rights by running:
dcat maintainer rm <user CURIE> <doc CURIE>
dcat
can also be used programmatically.
var Dcat = require('dcat');
var dcat = new Dcat();
var doc = {
'@context': 'https://dcat.io,
'@id': 'test',
name: 'hello world'
};
dcat.publish(doc, function(err, cdoc){
console.log(err, cdoc); //cdoc is compacted
});
See test/test.js
for more examples.
package.json
-> datapackage.json
-> package.jsonld
-> JSON-LD
+ schema.org + hydra + linked data fragment.
By default, dcat
uses dcat.io, a
linked data registry
hosted on cloudant.
You need a local instance of the linked data registry running on your machine on port 3000. Then, run:
npm test
Apache-2.0.