{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Seq/ICTV/MSL and VMR\n",
"\n",
"International Committee on Taxonomy of Viruses (ICTV) [^1] maintains a database [^2] of virus taxonomic classification [^3].\n",
"The Master Species Lists (MSL) [^4] provides \"the official, current virus taxonomy approved by the ICTV\".\n",
"The Virus Metadata Resource (VMR) [^5] provides \"exemplars and additional isolates for each virus species\".\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## An Example\n",
"\n",
"ICTV provides MSL downloads in .xlsx [^4], web version of interactive search [^6], and visual taxonomy browser [^7].\n",
"For example, if we are interested in studying the virus causing feline acquired immunodeficiency syndrome [^8], we can find the following taxonomy information in the ICTV MSL:\n",
"\n",
"| Column name | value | taxon-specific suffix * |\n",
"| --- | --- | --- |\n",
"| Realm | Riboviria | ...viria |\n",
"| Subrealm | | ...vira |\n",
"| Kingdom | Pararnavirae | ...virae |\n",
"| Subkingdom | | ...virites |\n",
"| Phylum | Artverviricota | ...viricota |\n",
"| Subphylum | | ...viricotina |\n",
"| Class | Revtraviricetes | ...viricetes |\n",
"| Subclass | | ...viricetidae |\n",
"| Order | Ortervirales | ...virales |\n",
"| Suborder | | ...virineae |\n",
"| Family | Retroviridae | ...viridae |\n",
"| Subfamily | Orthoretrovirinae | ...virinae |\n",
"| Genus | Lentivirus | ...virus |\n",
"| Subgenus | | ...virus |\n",
"| Species | Lentivirus felimdef | \n",
"| Genome | ssRNA-RT |\n",
"| Last Change | Renamed, |\n",
"| MSL of Last Change | 39 |\n",
"| Proposal for Last Change | 2023.009D.Retroviridae_68rensp.zip |\n",
"| Taxon History URL | ictv.global=202305029 |\n",
"\n",
"* [^15]\n",
"\n",
"From the taxon history, we can see that the species name was *Feline immunodeficiency virus* (FIV) in the previous releases, and changed to *Lentivirus felimdef* in the 2023 release [^9].\n",
"The reasons of changes proposed are included in the Proposal for Last Change link.\n",
"Additional information about FIV, for example, about the genus *Lentivirus*, can be found in ICTV report [^10].\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How to\n",
"\n",
"The module `mtbp3.seq.ictvmsiview` accept a csv file exported from the MSL tab of MSL download file.\n",
"The csv file path can be assigned using the option `msl_file_path`.\n",
"For demonstration, a small subset of MSL is shipped with this module.\n",
"When using the option `msl_file_path = \"\"`, the example file will be loaded.\n",
"\n",
"To load a MSL csv file:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File supp_seq/ICTV_MSL39v4_example.csv has been loaded\n",
"Column names: ['Sort', 'Realm', 'Subrealm', 'Kingdom', 'Subkingdom', 'Phylum', 'Subphylum', 'Class', 'Subclass', 'Order', 'Suborder', 'Family', 'Subfamily', 'Genus', 'Subgenus', 'Species', 'Genome', 'Last Change', 'MSL of Last Change', 'Proposal for Last Change ', 'Taxon History URL']\n",
"Total number of rows: 65\n"
]
}
],
"source": [
"from mtbp3.seq import ictvmslview\n",
"msl = ictvmslview.ictvmsl(msl_file_path = \"\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please note that the column names will be used as `search_rank` in this module and those are case sensitive.\n",
"The `search_strings` is not case sensitive, as shown below.\n",
"\n",
"To view the first row of loaded file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(msl.msl.iloc[0].transpose())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To view the unique values of 'Realm' in the file and number of species under each realm:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"msl.msl['Realm'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The counts above are from the example file.\n",
"\n",
"The full MSL 39v4 includes the following realms and kingdoms:\n",
"\n",
"```\n",
" Virus:\n",
" ├── Adnaviria:\n",
" │ └── Zilligvirae (32 Species)\n",
" ├── Duplodnaviria:\n",
" │ └── Heunggongvirae (4973 Species)\n",
" ├── Monodnaviria:\n",
" │ ├── Loebvirae (60 Species)\n",
" │ ├── Sangervirae (22 Species)\n",
" │ ├── Shotokuvirae (1930 Species)\n",
" │ └── Trapavirae (16 Species)\n",
" ├── NA:\n",
" │ └── NA (636 Species)\n",
" ├── Riboviria:\n",
" │ ├── NA (17 Species)\n",
" │ ├── Orthornavirae (6423 Species)\n",
" │ └── Pararnavirae (272 Species)\n",
" ├── Ribozyviria:\n",
" │ └── NA (21 Species)\n",
" └── Varidnaviria:\n",
" ├── Bamfordvirae (279 Species)\n",
" └── Helvetiavirae (9 Species)\n",
"```\n",
"To count species within a subset:\n",
"\n",
"```python\n",
"\n",
"print('\\n'.join(msl.count_species(count_rank='Phylum', outfmt=\"tree\", search_within_subset={'Kingdom': 'Bamfordvirae'}))\n",
"\n",
"```\n",
"\n",
"Output:\n",
"\n",
"```\n",
" Virus:\n",
" └── [Realm] Varidnaviria:\n",
" └── [Kingdom] Bamfordvirae:\n",
" ├── [Phylum] NA (1 Species)\n",
" ├── [Phylum] Nucleocytoviricota (132 Species)\n",
" └── [Phylum] Preplasmiviricota (146 Species)\n",
"```\n",
"\n",
"### Search within MSL \n",
"\n",
"To search for rows with \"lentivirus fel\" included in a species name:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(msl.find_rows_given_str(search_strings=\"lentivirus fel\", color=\"red\").iloc[0].transpose())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To search for rows with \"lentivirus\" included in species: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(msl.find_rows_given_str(search_strings=\"lentivirus\", search_rank=\"Species\", color=\"red\", narrow=True)[['Genus', 'Species','Genome']])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To search for genus *Lentivirus* using `exact=True` option (partly exact, the searching is still not case sensitive): "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(msl.find_rows_given_str(search_strings=\"lentivirus\", search_rank=\"Genus\", color=\"red\", outfmt='tree', exact=True)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are more than one virus species in family *Retroviridae* that can infect feline.\n",
"To search for virus within family *Retroviridae* that can infect feline:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(msl.find_rows_given_str(search_strings=\" fel\", search_rank=\"Species\", color=\"red\", outfmt='tree', search_within_subset={\"Family\":\"Retroviridae\"})))"
]
},
{
"cell_type": "markdown",
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"source": [
"To search for two known virus species:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(msl.find_rows_given_str(search_strings=[\"Gammaretrovirus felleu\", \"Lentivirus felimdef\"], search_rank=\"Species\", color=\"red\", outfmt='tree')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The tree above always shows 8 ranks.\n",
"\n",
"There are two more types of tree available, including a tree with nonempty rank:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(msl.find_rows_given_str(search_strings=[\"Gammaretrovirus felleu\", \"Lentivirus felimdef\"], search_rank=\"Species\", color=\"red\", outfmt='tree', tree_style=\"drop\")))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and a tree with full 15 ranks:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(msl.find_rows_given_str(search_strings=[\"Gammaretrovirus felleu\", \"Lentivirus felimdef\"], search_rank=\"Species\", color=\"red\", outfmt='tree', tree_style=\"full\")))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Updated MSL\n",
"\n",
"To download the current release, use `msl.update_msl(version=\"current\")`.\n",
"That will return output:\n",
"\n",
"```\n",
"File of version current has been loaded\n",
"Column names: ['Sort', 'Realm', 'Subrealm', 'Kingdom', 'Subkingdom', 'Phylum', 'Subphylum', 'Class', 'Subclass', 'Order', 'Suborder', 'Family', 'Subfamily', 'Genus', 'Subgenus', 'Species', 'Genome', 'Last Change', 'MSL of Last Change', 'Proposal for Last Change ', 'Taxon History URL']\n",
"Total number of rows: 14690\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To see versions available to download:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"msl.update_msl(version=\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Subtypes within Species\n",
"\n",
"ICTV MSL focuses on \"taxa at and above the species rank\" [^11].\n",
"Classification below the species are often related to species specific characteristics [^12]. \n",
"\n",
"\n",
"## Species Exemplar\n",
"\n",
"ICTV VMR provides one exemplar (and may be more additional isolates) for each species in VMR, with Genebak accession number and direct link to NCBI database.\n",
"\n",
"To load a VRM table extracted from VRM download file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mtbp3.seq import ictvvmrview\n",
"vmr = ictvvmrview.ictvvmr(vmr_file_path = \"\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When the path is empty, an example file will be used for illustration. \n",
"Similarily, use `vmr.update_vmr(version=\"current\")`. to download the current version from ICTV.\n",
"\n",
"To search for exemplars including \"feline\" using `search_rank_or_exemplar=\"Virus name(s)\"`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(vmr.find_rows_given_str(search_strings=\"feline\", search_rank_or_exemplar=\"Virus name(s)\", color=\"red\", outfmt='tree', tree_style=\"drop\")))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"To search for genus *Lentivirus*:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\n'.join(vmr.find_rows_given_str(search_strings=\"lentivirus\", search_rank_or_exemplar=\"Genus\", color=\"red\", outfmt='tree', exact=True, tree_style=\"drop\")))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Other Resource\n",
"\n",
"NCBI:\n",
"\n",
"- Explore Virus Data [^13]\n",
"- NCBI Visual Data Dashboard [^14]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reference\n",
"\n",
"[^1]: ICTV. (2024). About Virus Taxonomic Classification. ([web page](https://ictv.global/taxonomy/about))\n",
"[^2]: Elliot J Lefkowitz, Donald M Dempsey, Robert Curtis Hendrickson, Richard J Orton, Stuart G Siddell, Donald B Smith, Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV), Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D708–D717, ([web page](https://doi.org/10.1093/nar/gkx932))\n",
"[^3]: wiki. (year). Virus classification. ([web page](https://en.wikipedia.org/wiki/Virus_classification))\n",
"[^4]: ICTV. (2024). Master Species Lists (MSL). ([web page](https://ictv.global/msl))\n",
"[^5]: ICTV. (2024). Virus Metadata Resource (VMR). ([web page](https://ictv.global/vmr))\n",
"[^6]: ICTV. (2024). Current ICTV Taxonomy Release. ([web page](https://ictv.global/taxonomy))\n",
"[^7]: ICTV. (2024). Visual Taxonomy Browser. ([web page](https://ictv.global/taxonomy/visual-browser))\n",
"[^8]: Sykes J. E. (2014). Feline Immunodeficiency Virus Infection. Canine and Feline Infectious Diseases, 209–223. ([web page](https://doi.org/10.1016/B978-1-4377-0795-3.00021-1))\n",
"[^9]: ICTV. (2023). History of the taxon. ([web page](https://ictv.global/taxonomy/taxondetails?taxnode_id=202305029))\n",
"[^10]: ICTV. (year). The ICTV Report on Virus Classification and Taxon Nomenclature. ([web page](https://ictv.global/report/chapter/retroviridae/retroviridae/lentivirus))\n",
"[^11]: Siddell, S. G., Smith, D. B., Adriaenssens, E., Alfenas-Zerbini, P., Dutilh, B. E., Garcia, M. L., Junglen, S., Krupovic, M., Kuhn, J. H., Lambert, A. J., Lefkowitz, E. J., Łobocka, M., Mushegian, A. R., Oksanen, H. M., Robertson, D. L., Rubino, L., Sabanadzovic, S., Simmonds, P., Suzuki, N., Van Doorslaer, K., … Zerbini, F. M. (2023). Virus taxonomy and the role of the International Committee on Taxonomy of Viruses (ICTV). The Journal of general virology, 104(5), 001840. ([web page](https://doi.org/10.1099/jgv.0.001840))\n",
"[^12]: Kuhn, J. H., Bao, Y., Bavari, S., Becker, S., Bradfute, S., Brister, J. R., Bukreyev, A. A., Chandran, K., Davey, R. A., Dolnik, O., Dye, J. M., Enterlein, S., Hensley, L. E., Honko, A. N., Jahrling, P. B., Johnson, K. M., Kobinger, G., Leroy, E. M., Lever, M. S., Mühlberger, E., … Nichol, S. T. (2013). Virus nomenclature below the species level: a standardized nomenclature for natural variants of viruses assigned to the family Filoviridae. Archives of virology, 158(1), 301–311. ([web page](https://doi.org/10.1007/s00705-012-1454-0))\n",
"[^13]: NCBI. (2024). Explore Virus Data: Feline immunodeficiency virus. ([web pate](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Feline%20immunodeficiency%20virus,%20taxid:11673))\n",
"[^14]: NCBI. (2024). NCBI Visual Data Dashboard. ([web page](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/))\n",
"[^15]: ICTV. (year). Taxon names are written differently from virus names. ([web page](https://ictv.global/faq/names))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}