{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# U/Data/JSON tools\n", "\n", "JavaScript Object Notation (JSON) is a data exchange format deigned to be \"minimal, portable, textual, and a subset of JavaScript\"[^1].\n", "The definition of JSON was originally designed as part of JavaScript.\n", "Now the format is widely used for many applications.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## JSON Format \n", "\n", "JSON grammar, data structure, and conformance rules can be found in reference [^3]. \n", "\n", "Below is an example of using `json` to process a JSON instance found in reference [^4]:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "import json \n", "\n", "ex1 = {\n", " \"Image\": {\n", " \"Width\": 800,\n", " \"Height\": 600,\n", " \"Title\": \"View from 15th Floor\",\n", " \"Thumbnail\": {\n", " \"Url\": \"http://na/image/001\",\n", " \"Height\": 125,\n", " \"Width\": 100\n", " },\n", " \"Animated\" : False,\n", " \"IDs\": [116, 943, 234, 38793]\n", " }\n", "}\n", "print(json.dumps(ex1, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are three literal names in JSON format, all lower cases: false, true, null. \n", "Note that the output printed above shows `false`, while the Python format input is `False`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## JSON Schema" ] }, { "cell_type": "markdown", "metadata": { "vscode": { "languageId": "plaintext" } }, "source": [ "For a larger dataset, a foundation schema or structure template is helpful to infer, create, modify and validate receiving JSON instances to ensure correct data exchange.\n", "A brief history and links about JSON schema can be found in reference [^5].\n", "A schema object describe the structure of elements within a JSON dataset [^6].\n", "\n", "Please note that there are multiple Python tools available for generating JSON schema. \n", "Discussions below use package `genson` for illustration of the use of schema.\n", "Other schema tools should work as well.\n", "\n", "To generate a schema from an object using package `genson`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "from genson import SchemaBuilder\n", "\n", "builder = SchemaBuilder()\n", "builder.add_object(ex1['Image'])\n", "builder.to_schema()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To generate a schema from a list of objects (records) using package `genson`: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "ex2 = [ex1['Image']] + [\n", " {\n", " \"Width\": \"701\",\n", " \"Height\": -1.0,\n", " \"Title\": \"View from 16th Floor\",\n", " \"new_field\": \"This is a new field\"\n", " }\n", "]\n", "\n", "builder = SchemaBuilder()\n", "builder.add_object(ex2)\n", "print(json.dumps(builder.to_schema(), indent=2))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of `to_schema()` is a JSON document and can be edited as needed.\n", "The package `genson` also provides functions to update a schema, and to create extended schema builder." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The module `util.jsonsch` accepts a schema object, and provides summary. \n", "Note that there are wide varieties of usage of JSON format and schema.\n", "The module `util.jsonsch` is still in developing stage and only supports a set of keywords." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reference\n", "\n", "[^1]: IETF. (2014). RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format. ([web page])https://datatracker.ietf.org/doc/html/rfc7159.html#section-1))\n", "[^2]: Python Software Foundation. (2024). json — JSON encoder and decoder. ([web page](https://docs.python.org/3/library/json.html#json-to-py-table))\n", "[^3]: IETF. (2014). RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format. ([web page])https://datatracker.ietf.org/doc/html/rfc7159.html#section-2))\n", "[^4]: IETF. (2014). RFC 7159: The JavaScript Object Notation (JSON) Data Interchange Format. ([web page])https://datatracker.ietf.org/doc/html/rfc7159.html#section-13))\n", "[^5]: The JSON Schema Organization. (year). History of JSON Schema. ([web page](https://json-schema.org/overview/what-is-jsonschema#history-of-json-schema))\n", "[^6]: The JSON Schema Organization. (year). Creating your first schema. ([web page](https://json-schema.org/learn/getting-started-step-by-step))\n", "[&7]: wolverdude. (2024). genson 1.3.0. ([web page](https://pypi.org/project/genson/)) | ([github](https://github.com/wolverdude/genson/))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.1" } }, "nbformat": 4, "nbformat_minor": 4 }