JSONPath reference
Regardless of whether you want to use rq
, the rsonpath-lib
library,
or contribute to the project, you should be familiar with JSONPath, the core
query language we use to process JSONs.
The JSONPath language is defined by
an IETF specification,
currently in draft. The rsonpath
project implements a subset of the language
according to the spec with two major differences outlined in
rsonpath
-specific behavior.
The below reference uses terminology from the spec, but tries to use less dry language. If you already know the spec, you can probably skip this chapter.
JSONs as trees
A JSON document is a tree structure, defined in the intuitive way.
A node is either an atomic value, i.e. a number, string,
true
, false
, or null
, or a complex value, i.e. an object
or a list.
An object is a collection of members identified by member names or keys. Each member name has a single child node associated. A list is an ordered collection of child nodes identified by a zero-based index.
Anatomy of a query
A JSONPath query, in essence, defines a pattern that a path in a JSON must match for the node at that path to be selected. The simplest query is a sequence of keys.
$.a.b.c.d
It will access the value of the "a"
key in the root, then the value
under the "b"
key in that object, then the value under "c"
,
and finally the value under "d"
. For example, in the JSON:
{
"a": { "b": { "c": { "d": 42 } } }
}
it will access the value 42
by digging into the structure key by key.
$ rq '$.a.b.c.d' --json '{ "a": { "b": { "c": { "d": 42 } } } }'
42
In general, a JSONPath query is a sequence of segments. Each segment contains one or more selectors. Canonically, selectors are delimited within square brackets, but some selectors have a shorthand dot-notation. For example, the query above is equivalent to:
$['a']['b']['c']['d']
$ rq "$['a']['b']['c']['d']" --json '{ "a": { "b": { "c": { "d": 42 } } } }'
42
A valid query starts with the $
character, which represents the root
of the JSON. In particular, the query $
simply selects the entire document.
Segments
There are two types of segments:
-
child segment selects immediate children, or, in other words, digs into the structure of the document one level deeper. A child segment is either a bracketed sequence of selectors
[<sel1>, ..., <selN>]
, or a shorthand dot notation like.a
or.*
. -
descendant segment selects any subdocument, or, in other words, digs into the structure of the document at any level deeper. A descendant segment is either a bracketed sequence of selectors preceded by two dots
..[<sel1>, ..., <selN>]
, or a shorthand double-dot notation like..a
or..*
.
Selectors
Note that we only cover selectors that are currently supported by rsonpath
.
Issues to support more selectors can be found under the
area: selector label.
Name selector
The name selector selects the child node under a given member name.
It’s most commonly found under its shorthand form, .key
or ..key
,
which works with simple alphanumeric member names.
In the canonical form, the name has to be enclosed between single or double quotes, and enables escape sequences. For example:
.a
,['a']
,["a"]
all select a child under the keya
.['"']
selects a child under the key"
.["'"]
selects a child under the key'
.['complex name']
selects a child under the key containing a space:
$ rq "$['complex name']" --json '{ "complex name": 42 }'
42
Wildcard selector
The wildcard selector selects any child node, be it under a member name
in an object, or a value in a list. It also has a common shorthand form,
.*
or ..*
, whereas the canonical form is [*]
. For example, running
on:
{
"a": 42,
"b": [ 1, 2 ]
}
the query $[*]
selects 42
, and [ 1, 2 ]
.
$ rq '$[*]' --json '{ "a": 42, "b": [ 1, 2 ] }'
42
[ 1, 2 ]
Using the descendant selector we can recursively extract elements from the list:
$ rq '$..[*]' --json '{ "a": 42, "b": [ 1, 2 ] }'
42
[ 1, 2 ]
1
2
In general, the query ..*
selects all subdocuments of the JSON.
It’s not a smart query, as it can create outputs much longer than the source
document itself, consuming a lot of resources.
Index selector
The index selector selects a value from a list at a given zero-based index.
It only has a bracketed form, [index]
. For example, running on:
[ 1, 2, 3 ]
- the query
$[0]
selects1
; - the query
$[1]
selects2
; - the query
$[2]
selects3
; and - the query
$[3]
selects nothing, since the list has only 3 elements.
$ rq '$[0]' --json "[ 1, 2, 3 ]"
1
$ rq '$[1]' --json "[ 1, 2, 3 ]"
2
$ rq '$[2]' --json "[ 1, 2, 3 ]"
3
$ rq '$[3]' --json "[ 1, 2, 3 ]"
Combining segments
Segments can be chained arbitrarily to create complex queries.
For example, if we have a file ex.json
{
"firstName": "John",
"lastName": "Doe",
"number": "078-05-1120",
"phoneNumbers": [
{
"type": "work",
"number": "0123-4567-8888"
},
{
"type": "home",
"number": "0123-4567-8910"
}
],
"spouse": {
"firstName": "Jane",
"lastName": "Doe",
"number": "078-05-1121",
"phoneNumbers": [
{
"type": "work",
"number": "0123-4567-9999"
},
{
"type": "home",
"number": "0123-4567-8910"
}
]
}
}
we can extract all phone numbers with:
$ rq '$..phoneNumbers[*].number' ./ex.json
"0123-4567-8888"
"0123-4567-8910"
"0123-4567-9999"
"0123-4567-8910"
Note that each part of the query is needed here:
- the first segment is descendant, so that we pick up both the root’s number array and the one under “spouse”;
- without specifying the “phoneNumbers” key (for example running
$..number
) we wouldn’t be able to filter out the two irrelevant “number” keys; - the wildcard selector
[*]
makes sure we select all the numbers, regardless of how long the list may be.
Selector availability
Not all of JSONPath’s functionality is supported by rsonpath
as of right now.
Supported segments
Segment | Syntax | Supported | Since | Tracking Issue |
---|---|---|---|---|
Child segment (single) | [<selector>] | ✔️ | v0.1.0 | |
Child segment (multiple) | [<selector1>,...,<selectorN>] | ❌ | ||
Descendant segment (single) | ..[<selector>] | ✔️ | v0.1.0 | |
Descendant segment (multiple) | ..[<selector1>,...,<selectorN>] | ❌ |
Supported selectors
Selector | Syntax | Supported | Since | Tracking Issue |
---|---|---|---|---|
Root | $ | ✔️ | v0.1.0 | |
Name | .<member> , [<member>] | ✔️ | v0.1.0 | |
Wildcard | .* , ..* , [*] | ✔️ | v0.4.0 | |
Index (array index) | [<index>] | ✔️ | v0.5.0 | |
Index (array index from end) | [-<index>] | ❌ | ||
Array slice (forward, positive bounds) | [<start>:<end>:<step>] | ❌ | #152 | |
Array slice (forward, arbitrary bounds) | [<start>:<end>:<step>] | ❌ | ||
Array slice (backward, arbitrary bounds) | [<start>:<end>:-<step>] | ❌ | ||
Filters – existential tests | [?<path>] | ❌ | #154 | |
Filters – const atom comparisons | [?<path> <binop> <atom>] | ❌ | #156 | |
Filters – logical expressions | && , || , ! | ❌ | ||
Filters – nesting | [?<expr>[?<expr>]...] | ❌ | ||
Filters – arbitrary comparisons | [?<path> <binop> <path>] | ❌ | ||
Filters – function extensions | [?func(<path>)] | ❌ |