When cypher files have been created for a given code list they should be stored in either code-list-scripts/cypher or geo-specific-codelist. Files should be named according to their code list ID i.e. a script for ashe-earnings should be called ashe-earnings.cypher
All cypher code list files should define:
1) The code list node, including its edition 2) Constraints on the code list node, so only one node exists per edition 3) Constraints on the codes, so only one node exists for each code across all editions of that code list 4) Each code node, and its relationship to the code list node, which should contain a web-friendly label
Geography code lists differ in that:
_code_geography_geography labelgeography_list_IDThere are also some scripts to help create new Cypher queries for CMD.
Used to create a ‘real’ geography codelist using data from the geoportal.
https://onsdigital.github.io/dp-code-list-scripts/geo-specific-codelist/new-codelist.html
Used to create a codelist using the two column format from the v4.
https://onsdigital.github.io/dp-code-list-scripts/new-codelist/new-codelist.html
Note: Create codelist CSV to upload from v4 file by extracting the two columns and removing duplicates.
To keep options open, its preferable to keep a full list of code lists up to date in both Cypher and Gremlin. This can most easily be achieved by continuing to create the scripts in Cypher, and then using conversion scripts to parse those files and output the equivalent Gremlin.
There are separate conversion scripts for general code list scripts and geography code lists.
To convert general code lists:
./convert.sh code-list-scripts/cypher/<new-file-name>
To convert geography code lists:
./convert-geography.sh geo-specific-codelist/*.cypher
Each of these scripts can also take multiple cypher files as input, and in all cases will output the .grm file into the relevant gremlin directory.
To load code lists locally, code lists have been grouped by dataset into a selection of YAML files. When loading code lists, you’re likely doing it to satisfy the requirements for importing a specific dataset. The load program has been written to allow multiple code lists to be loaded to the relevant database by specifying a query language and which dataset YAML file defines the required lists.
e.g. ./load -q=cypher -f=ashe.yaml will import all code lists required for ASHE datasets using the cypher query console.
-q: query language - options are cypher or gremlin
-f: YAML file
Optional arguments: -u and -p for providing user and passphrase specifically to cypher-shell
Load relies on both cypher-shell and gremlin query consoles, so make sure you have both installed, and connections to whichever database you’re targeting.
Note: Some code list scripts do not properly set their constraints, and code lists may be shared across multiple datasets. This may lead to a situation where running 2 different dataset import scripts results in 2 copies of the same code list in the graph. The import process will not work in this case. Be sure to only load each code list once.
To load code lists to any environment you will need:
SSH access to the environment (open a tunnel) e.g.
dp ssh sandbox publishing 1 -p 8182:sandbox-neptune-cluster...amazonaws.com:8182
the console or shell command for the query language you plan on using, either:
gremlin.sh (for Neptune), see Install Gremlin Consolecypher-shell (for legacy/Neo4J)Loading each code list is a 3-step process from that point:
a. Pass the relevant code list file into the query command - this is where cypher and gremlin diverge:
for cypher, use the cypher shell e.g.
cypher-shell < code-list-scripts/cypher/ashe-earnings.cypher
for gremlin, use the gremlin-import.sh script, which wraps the gremlin.sh command (see above) e.g.
./gremlin-import.sh geo-specific-codelist/gremlin/ctry17.grm
b. Use the load tool to select which query language and YAML file’s contents you would like to load
When you create a YAML file for a dataset (or group some code lists), please run ./sanity.sh to ensure there are no missing/misplaced files.