Beginner’s Guide to JSON-LD

Dillon Redding
10 min readNov 19, 2021

I’ve become sort of the de facto expert on JSON-LD where I work and with all the questions I’ve been getting, I thought I’d write a little guide to make learning and getting started with JSON-LD a little easier and available to a wider audience.

While there’s already a lot of learning material for JSON-LD out there, I found JSON-LD quite daunting and confusing for a long time. So, hopefully, my perspective in this guide can help other, like-minded individuals.

What is JSON-LD?

JSON-LD is a serialization format of the Resource Description Framework (RDF), a component of linked data (hence the “LD”), which is itself a part of the tech stack of the semantic web. Simply put, JSON-LD allows us to add important semantic information that’s missing from normal JSON representations.

Consider a JSON object with firstName and lastName fields representing a person’s name. These names may be fine, for say display purposes, but what if we want to know which is the given name and which is the family name? There’s not enough information there since name order varies around the world. Even with a locale, it could require a lot of work to determine which is name is which.

JSON-LD lets us add this semantic information by essentially defining what fields like firstName and lastName mean in the context of our JSON document.

Before we dig any deeper into the details of JSON-LD though, let’s first cover some basic principles from RDF, the framework behind JSON-LD.

RDF Basics

RDF gives us a way of describing an ontology of resources, which is essentially a labeled, directed graph. These graphs are built up from triples, the most basic structure in RDF. Triples consist of a subject, a predicate, and an object.

Visual depiction of a triple

The subject and object of a triple are nodes in a graph, and the predicate is an arrow relating the subject to the object. The values for each component form the labels of the graph. There are three types of labels: IRI, blank, or literal.

An IRI is the unique address of a resource. IRIs form a superset of URIs that allow for internationalization. However, we’ll be using URLs throughout this guide, since they are dereferenceable URIs. So, for my mathematically inclined:

URLs are a subset of URIs, which are a subset of IRIs
URLs ⊂ URIs ⊂ IRIs

An IRI is a valid label for any component of a triple. It’s worth noting that a triple’s predicate must be IRIs.

A blank label, or blank node identifier, is used to describe a blank node — a resource without an IRI — and so this type of label is only valid for subjects and objects. The format of a blank node identifier depends on which RDF serialization format we’re using. In JSON-LD, a blank node identifier is a string that starts with _:. However, explicit identifiers for blank nodes are rarely needed.

A literal is used to describe data about another resource, things like numbers (“42”), dates (“1970–01–01”), or just plain strings (“foo”). These labels are only valid in the object position of a triple.

And that’s it! Those are the basics of RDF we need for now. I’ve only scratched the surface here, so I recommend reading W3C’s RDF Primer if you’re interested in learning more about RDF.

Now let’s shift our focus to building our own graph, but before we jump into expressing it with JSON-LD, for pedagogical purposes, let’s build off our knowledge of triples and use a much simpler RDF format called N-Triples.

N-Triples Example

In this example, we’ll cover the different types of triples we need to describe the following graph:

Images from the Harry Potter Wiki

Starting Simple

Let’s start by constructing a triple to describe the given name of a person. To do so, we need a subject, a predicate, and an object.

First, for our subject, let’s start from the top of the graph with Neville Longbottom. We need an IRI (or blank node identifier, but we’ll get to that later) to identifier Neville. Let’s arbitrarily choose the Harry Potter Wiki page, although we could use the Wikipedia page, the Heroes Wiki page, or some other IRI.

Next, let’s (again, arbitrarily) go with the givenName property from Schema.org for the predicate, which tells us that the object is the given name of the subject.

We could instead use the givenName property from FOAF or a similar term from another vocabulary. Regardless of what we choose, we should use terms from one or more well-defined vocabularies like Schema.org, FOAF, Dublin Core, or a custom vocabulary.

Remember that a predicate must be an IRI, but an IRI doesn’t have to be dereferenceable (e.g., urn:uuid:f124874d-13e6–487b-adcd-63e432479315). However, it’s good practice for a predicate to point to some sort of documentation that defines what it means, so we can easily look it up.

Finally, our object can just be a string literal of the given name. With that our first triple is as follows:

Specifying Datatypes

Now let’s look at a triple describing Neville’s birthday:

In this one, the subject is the same as before, but the predicate uses FOAF’s birthday property (showing we’re not limited to one vocabulary) and the object is a typed literal. Since the value is a date, a simple string doesn’t quite convey enough, so we specify additional type information with a datatype IRI.

In this case, we use the date datatype from the XML Schema. Similar to the vocabularies for predicates, the use of these specific datatypes isn’t required by RDF, but they are highly recommended.

We can also type boolean, integer, and double literals, but I won’t get into that with N-Triples since JSON (and subsequently JSON-LD) natively supports those types. The syntax for types is also unimportant since it’s very different (and much easier) in JSON-LD.

This info comes in handy when parsing the data in some programming language. In JavaScript, for instance, the datatype IRI http://www.w3.org/2001/XMLSchema#date could signal a parser to use Date instead of String when parsing the literal. Then we could use native Date methods to, say, convert the value to a Unix timestamp.

Typed Nodes

In addition to typing a literal, we can specify the type of a node:

This triple uses RDF’s predicate for specifying a node’s type, which is the Person type from Schema.org.

Don’t let the “schema” in “Schema.org” confuse you. From the perspective of RDF, the purpose of Schema.org is to define vocabulary terms. It is not meant to define the structure or shape of data. For example, just because we have a Person resource doesn’t mean it will have every property listed (email, familyName, etc.). The shape of a data instance is an orthogonal concern better suited to something like JSON Schema.

Again, Schema.org isn’t required here. We could’ve just as easily used FOAF’s Person instead.

Relating Resources

Information about a resource is very useful, but the real power of RDF can be seen when we start associating resources with each other.

This triple tells us that there is a “generic bi-directional social/work relation” between Neville and Luna. In other words, Neville knows Luna.

Hopefully, you can see how this becomes very powerful since we can also have triples where Luna (the object) is the subject.

It’s worth noting that a subject can use the same predicate multiple times.

A Blank Object

We can use a blank node if we want to associate or describe a resource without an IRI.

This triple says Neville is an alumnus of a blank node identified by _:hogwarts, which can now be used as the subject of other triples.

As I mentioned, blank node identifiers usually aren’t explicit in JSON-LD. We’ll see how they translate over shortly.

Internationalized Data

Lastly, let’s look at how we can specify the language of a literal.

This triple says our subject (a blank node) has the slogan “Draco Dormiens Nunquam Titillandus,” which is Latin, indicated by the language tag la.

Collecting Our Triples

That’s it for various types of triples. Our final N-Triples document describing the above graph is as follows:

Complete N-Triples example

This is rather straightforward. It’s simply all our triples (plus some) collected in one document. Now let’s see how we can use these same ideas in a more familiar format.

The Power of JSON-LD

Converting our N-Triples graph directly into JSON-LD, we get the following:

Here are a few noteworthy differences from N-Triples:

  1. Our subjects are represented by objects (lines 1, 10, 19, and 25) and the subjects’ IRI is specified by the @id keyword, avoiding the need to repeat it in every triple. For our blank node (line 25), we just leave off @id entirely.
  2. We get shorthand for specifying the type of a node via the @type keyword. A caveat for literal nodes is that the value becomes an object with the @type and @value keywords, the latter holding the literal value (lines 5 and 14).
  3. Since the value of the logo property (line 28) is an IRI, we need to wrap it in an object and use the @id keyword. The logo is effectively another resource, but there are no triples where it’s the subject.
  4. Similar to the typed literal node, we use the @language keyword (line 33) for specifying the language of a string.

There are however still some oddities about the JSON that make it difficult to write code against, primarily the use of IRIs for property names. Luckily, a big part of JSON-LD is making RDF easier to consume in code by giving us a lot of syntactic constructs to make this feel more like idiomatic JSON.

Let’s go through a few features that will effectively compact our JSON-LD into something more usable in our programming language of choice.

Aliasing Predicates

To solve the issue of IRI property names, we can create a dictionary of property names to predicate IRIs. We do so via the @context keyword:

If this is the only change we made, this would already be significantly easier to use, but let’s explore some of the other things we can do.

Compact IRIs

Since we have several terms from the same vocabulary, we can avoid having to repeat the base IRI for every term by utilizing compact IRIs:

Here, we simply define prefixes schema and foaf (lines 3 and 4) and then replace the base IRI in our term definitions with {prefix}:.

Note the trailing slash in the prefix definitions. This is very important since JSON-LD processors do a literal text replacement on expansion.

Aliasing Node Types

Similarly to aliasing predicates, we can also alias node types:

Now we can use the aliases for our @type keywords (lines 16, 25, 34, and 39).

Type Coercion

We can also move the datatype information for certain literals into the terms in the @context, utilizing JSON-LD’s type coercion.

For our birthday alias (line 6), we use an object instead, throwing the IRI into the @id keyword and the datatype into the @type keyword as before.

The logo alias (line 12) uses a special JSON-LD construct for specifying the corresponding object type as (potentially) an IRI where the @type keyword is set to "@id".

Now we can use simple strings for our birthday and logo fields (lines 24, 30, and 41), instead of needing to wrap them in an object to specify the datatype.

Default Vocabulary

Many of our terms are coming from Schema.org, so we can shrink our context even further by defining a default vocabulary.

We replace the schema prefix with the @vocab keyword, defining our default vocabulary for predicates and node types. Now we can drop all the terms from Schema.org without losing any semantic info. We keep logo to specify that the value is an IRI, but notice that the @id no longer needs the prefix.

Type-Scoped Contexts

Another useful feature I’d like to demonstrate is type-scoped contexts. While these aren’t absolutely necessary for every JSON-LD document, they can help us namespace our terms for particular types. Here’s how they would look in our example:

To create a type-scoped context, we nest our terms in a @context (lines 6 and 16) that is part of the definition of our type terms (lines 4 and 14). The IRIs for our types are placed in a nested @id keyword (lines 5 and 15; no prefix required thanks to our default vocab).

The primary advantage we gain here is that we keep terms separated, allowing us to have different definitions for the same term depending on the type.

Remote Contexts

Finally, we don’t have to embed our context into our document. We can simply reference the URL of a remote context. Allowing other JSON-LD documents to reuse or import the @context, as well as enable caching to reduce the payload size.

Summary

In this guide, we learned about RDF triples, described a graph using N-Triples, converted that to JSON-LD, and explored several syntactic features of JSON-LD. We covered a lot of ground and still only scratched the surface of what’s possible.

If you want to learn more, click on any of the links in this article and they should take you to worthwhile documentation. I also recommend experimenting in the JSON-LD playground. Also, check out the final JSON-LD document on GitHub.

If you have any questions or feedback, feel free to hit me up on Twitter @dillon_redding.

--

--