*2.1. Database Construction*

GAG-DB is available at https://www.gagdb.glycopedia.eu. The database is populated with information extracted from the PDB [51]. It includes the three-dimensional structural information on GAG and GAG oligosaccharides in interaction with proteins. We propose a classification based on the nature of GAGs, e.g., hyaluronan, heparin/heparan sulfate, chondroitin sulfate/dermatan sulfate, and keratan sulfate. GAG mimetics are included, as long as they appear in the PDB. The content of GAG-DB is focused on three-dimensional data, with an appropriate curation of the nomenclature, and extended related information. The entries are structures of GAGs and GAG protein complexes obtained by a wide range of methods.

To avoid any confusion; we note that under the name GAG database, a resource to gather genomic annotation cross-references has been developed and published in 2013 (The GAG database: a new resource to gather genomic annotation cross-references, T Obadia, O Sallou, M Ouedraogo, G Guernec, F Lecerf and published (Gene. 2013;5;527(2):503-9., DOI:10.1016/j.gene.2013.06.063. Epub 16 2013 July). Available annotation data includes all transcripts and their identifiers, functional description of genes, chromosomal localisation, gene symbols, gene homologs for model species (human, chicken, mouse), and several identifiers to link those genes to external databases (UniProt, HGNC).

The GAG-DB database contains 15 entries of long-chain GAGs established from fiber X-ray diffraction. A value of 3.0 Å is assigned to the structural models that have been proposed from X-ray fiber diffraction, and to 0 for those established by solution NMR or X-ray scattering (the structures are not filtered). It also contains 125 manually curated entries extracted from PDBe [52,53] (September 2020 release). These three-dimensional structures have been experimentally determined with methods involving either X-ray single-crystal diffraction, or X-ray fiber diffraction and solution NMR, in conjunction with molecular modeling. The number of GAG-protein complexes amounts to 105. The value of the resolution index indicates the accuracy of the experimental conditions, high values (e.g., 4 Å) indicate a poor resolution and low values (e.g., 1.5 Å) a good resolution. The median resolution for X-ray crystallographic data in the Protein Data Bank is 2.05 Å. Proteins of the database can be grossly separated into enzymes and skeletal proteins. Interestingly, the size distribution of oligosaccharides complexed with proteins varies from 34 disaccharides to only one polysaccharide with a degree of polymerization (DP) of 10 (DP 3 (1), DP 4 (18), DP 5 (13), DP 6 (15), DP 7 (7), DP 8 (8), and DP 9 (1). More than 80% of the GAGs involved in the complexes are heparin and hyaluronic acid oligosaccharides. However, these figures tend to reflect the interest of a community in investigating those GAGs more obviously involved in biological and biomedical applications.

Our collection is far from covering the molecular diversity of GAGs. This lack of data echoes the limitations of carbohydrate synthesis that fails to provide sufficiently long sequences needed to properly investigate the molecular features driving interactions with proteins. Nonetheless, progress is in sight, as recently described in [54,55].

The representation of GAGs sequences complies with recommended nomenclatures and formats, the IUPAC condensed being the reference (http://www.sbcs.qmul.ac.uk/iupac/2carb/38.html) [56]. Each sequence is also encoded in a machine-readable GlycoCT format [57,58], and depicted in SNFG (Symbol Nomenclature for Glycan) [22], following the description provided in [9].

At present, information associated with each entity of the database is added manually. This allows for proper curation and annotation, at the expense of a time lag between the date of deposition and the date of release in the database. Technically, the database was developed with PHP version 7, Bootstrap version 3 and MySQL database version 7. The interface is compatible with all devices and browsers. The pages are dynamically generated to match user-selected search criteria in the query window. Interactive graphics are developed in JavaScripts on D3JS libraries version 3. A tutorial is available on the first page.
