{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<!-- \n",
                "    #  Copyright (C) 2023-2024 Y Hsu <yh202109@gmail.com>\n",
                "    #\n",
                "    #  This program is free software: you can redistribute it and/or modify\n",
                "    #  it under the terms of the GNU General Public license as published by\n",
                "    #  the Free software Foundation, either version 3 of the License, or\n",
                "    #  any later version.\n",
                "    #\n",
                "    #  This program is distributed in the hope that it will be useful,\n",
                "    #  but WITHOUT ANY WARRANTY; without even the implied warranty of\n",
                "    #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n",
                "    #  GNU General Public License for more details\n",
                "    #\n",
                "    #  You should have received a copy of the GNU General Public license\n",
                "    #  along with this program. If not, see <https://www.gnu.org/license/> \n",
                "-->"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "source": [
                "# S/ISO/PDF \n",
                "\n",
                "PDF stands for Portable Document Format, which was created by Adobe[^1], \n",
                "and currently maintained by the International Organization for Standardization (ISO) as an open source international standard [^2]. \n",
                "\n",
                "Some commonly used specialized PDF types include: \n",
                "\n",
                "- ISO 14289/PDF/UA  for accessible PDF documents and processors (extends PDF/A conformance level A) \n",
                "- ISO 15930/PDF/X  for printing \n",
                "- ISO 19005/**PDF/A  for long-term archiving** [^3] \n",
                "    - Sub-parts:\n",
                "        - ISO 19005-1:2005/PDF/A-1 (based on PDF v1.4)\n",
                "        - ISO 19005-2:2011/PDF/A-2 (based on PDF v1.7)\n",
                "        - ISO 19005-3:2012/PDF/A-3 (add file)\n",
                "        - ISO 19005-4:2020/PDF/A-4 (based on PDF v2.0)\n",
                "    - Not allow: audio, video, 3d objects, JS, certain actions, encryption, non-standard metadata\n",
                "    - Require: embedding font with proper license \n",
                "- ISO 24517/PDF/E  for representing engineering documents (CAD, etc.).\n",
                "\n",
                "For regulatory submission, FDA currently support \"PDF versions 1.4 through 1.7, PDF/A-1 and PDF/A-2\"[^4].\n",
                "Steps for **creating and validating** PDF/A files can be found in reference [^5]<sup>,</sup>[^6].\n",
                "\n",
                "The module `stdiso.pdfsummary` depends on package `pypdf` [^7].\n",
                "The module `stdiso.pdfsummary` include functions for creating summaries about a specified PDF file.\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "source": [
                "## PDF File Summary\n",
                "\n",
                "To use `mtbp3.stdiso`:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "outputs": [],
            "source": [
                "from mtbp3.stdiso.pdfsummary import pdfSummary\n",
                "\n",
                "pfr = pdfSummary(path=\"\")\n",
                "print(pfr.get_summary_string())"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "If the path left as empty, an example pdf file will be loaded for illustration.\n",
                "More details about the example pdf can be found here: https://arxiv.org/abs/1706.03762."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "To view the outline tree:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "outputs": [],
            "source": [
                "print(pfr.show_outline_tree())"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Work with Images\n",
                "\n",
                "We can see that there is one image in the 3rd page from the summary above. \n",
                "To extract the first image on the 3rd page:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "outputs": [],
            "source": [
                "img = pfr.get_image(page_index=2, image_index=0, outfolder='')\n",
                "print(type(img))\n",
                "print(img.size)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "outputs": [],
            "source": [
                "display(img.resize((300, int(300*img.size[1]/img.size[0]))))"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "source": [
                "The `resize()` function above resized the figure before displaying. Use `display(img)` in Jupyter if resizing is not required.\n",
                "\n",
                "To save the 2nd image on the 4th page to a file, add an existing folder path using `outfolder='add_path_here'`:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "outputs": [],
            "source": [
                "img_path = pfr.get_image(page_index=3, image_index=1, outfolder='.')"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "The function `get_image()` returns a file path instead of the image when the `outfolder` option is not an empty string. \n",
                "To read and display the saved image file in Jupyter:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "vscode": {
                    "languageId": "plaintext"
                }
            },
            "outputs": [],
            "source": [
                "from IPython.display import Image \n",
                "\n",
                "img = Image(filename=img_path, width=300)\n",
                "display(img)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Reference\n",
                "\n",
                "[^1]: Adobe. (2024). Everything you need to know about the PDF. ([web page](https://www.adobe.com/acrobat/about-adobe-pdf.html))\n",
                "[^2]: ISO. (2021). The standard for PDF is revised. ([web page](https://www.iso.org/news/ref2608.html))\n",
                "[^3]: pdfa.org. (2013). PDF/A in a Nutshell 2.0. ([web page](https://pdfa.org/resource/pdfa-in-a-nutshell-2-0/))\n",
                "[^4]: FDA. (2016). Portable Document Format (PDF) Specifications. ([pdf](https://www.fda.gov/files/drugs/published/Portable-Document-Format-Specifications.pdf))\n",
                "[^5]: Adobe. (2023). PDF/X-, PDF/A-, and PDF/E-compliant files (Acrobat Pro). ([web page](https://helpx.adobe.com/acrobat/using/pdf-x-pdf-a-pdf.html))\n",
                "[^6]: pypdf Contributors. (2024). PDF/A Compliance. ([web page](https://pypdf.readthedocs.io/en/stable/user/pdfa-compliance.html))\n",
                "[^7]: pypdf Contributors. (2024). pypdf. ([web page](https://pypdf.readthedocs.io/en/stable/index.html))\n",
                "\n",
                "\n",
                "\n",
                "\n"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.10.13"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}