{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading IIRC and incremental datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage with incremental-CIFAR100, IIRC-CIFAR100, incremental-Imagenet, and IIRC-Imagenet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.append(\"../..\")\n",
    "\n",
    "from iirc.datasets_loader import get_lifelong_datasets\n",
    "from iirc.definitions import PYTORCH, IIRC_SETUP\n",
    "from iirc.utils.download_cifar import download_extract_cifar100"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For using these datasets with the preset tasks schedules, the original *CIFAR100* and/or *ImageNet2012* need to be downloaded first.\n",
    "\n",
    "In the case of *CIFAR100*, the dataset can be downloaded using the following method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "downloading CIFAR 100\n",
      "dataset downloaded\n",
      "extracting CIFAR 100\n",
      "dataset extracted\n"
     ]
    }
   ],
   "source": [
    "download_extract_cifar100(\"../../data\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the case of *ImageNet*, it has to be downloaded manually, and be arranged in the following manner:\n",
    "* Imagenet\n",
    "  * train\n",
    "    * n01440764\n",
    "    * n01443537\n",
    "    * ...\n",
    "  * val\n",
    "    * n01440764\n",
    "    * n01443537\n",
    "    * ...\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then the *get_lifelong_datasets* function should be used. The tasks schedules/configurations preset per dataset are:\n",
    "\n",
    "* *Incremental-CIFAR100*: 10 configurations, each starting with 50 classes in the first task, followed by 10 tasks each having 5 classes\n",
    "* *IIRC-CIFAR100*: 10 configurations, each starting with 10 superclasses in the first task, followed by 21 tasks each having 5 classes\n",
    "* *Incremental-Imagenet-full*: 5 configurations, each starting with 160 classes in the first task, followed by 28 tasks each having 30 classes\n",
    "* *Incremental-Imagenet-lite*: 5 configurations, each starting with 160 classes in the first task, followed by 9 tasks each having 30 classes\n",
    "* *IIRC-Imagenet-full*: 5 configurations, each starting with 63 superclasses in the first task, followed by 34 tasks each having 30 classes\n",
    "* *IIRC-Imagenet-lite*: 5 configurations, each starting with 63 superclasses in the first task, followed by 9 tasks each having 30 classes\n",
    "\n",
    "Although these configurations might seem they are limiting the choices, but the point here is to have a standard set of tasks and class orders so that the results are comparable across different works, otherwise if needed, new task configurations can be added manually as well in the *metadata* folder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also need a transformations function that takes the image and converts it to a tensor, as well as normalize the image, apply augmentations, etc.\n",
    "\n",
    "There are two such functions that can be provided: *essential_transforms_fn* and *augmentation_transforms_fn*\n",
    "\n",
    "*essential_transforms_fn* should include any essential transformations that should be applied to the PIL image (such as convert to tensor), while *augmentation_transforms_fn* should also include the essential transformations, in addition to any augmentations that need to be applied (such as random horizontal flipping, etc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision.transforms as transforms\n",
    "\n",
    "essential_transforms_fn = transforms.ToTensor()\n",
    "augmentation_transforms_fn = transforms.Compose([\n",
    "    transforms.RandomCrop(32, padding=4),\n",
    "    transforms.RandomHorizontalFlip(),\n",
    "    transforms.ToTensor()\n",
    "])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating iirc_cifar100\n",
      "Setup used: IIRC\n",
      "Using PyTorch\n",
      "Dataset created\n"
     ]
    }
   ],
   "source": [
    "# The datasets supported are (\"incremental_cifar100\", \"iirc_cifar100\", \"incremental_imagenet_full\", \"incremental_imagenet_lite\", \n",
    "# \"iirc_imagenet_full\", \"iirc_imagenet_lite\")\n",
    "lifelong_datasets, tasks, class_names_to_idx = \\\n",
    "    get_lifelong_datasets(dataset_name = \"iirc_cifar100\",\n",
    "                          dataset_root = \"../../data\", # the imagenet folder (where the train and val folders reside, or the parent directory of cifar-100-python folder \n",
    "                          setup = IIRC_SETUP,\n",
    "                          framework = PYTORCH,\n",
    "                          tasks_configuration_id = 0,\n",
    "                          essential_transforms_fn = essential_transforms_fn,\n",
    "                          augmentation_transforms_fn = augmentation_transforms_fn,\n",
    "                          joint = False \n",
    "                         )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*joint* can also be set to *True* in case of joint training (all classes will come in one task)\n",
    "\n",
    "The result of the previous function has the following form:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'train': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7746a670>,\n",
       " 'intask_valid': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7567bf70>,\n",
       " 'posttask_valid': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7567bfa0>,\n",
       " 'test': <iirc.lifelong_dataset.torch_dataset.Dataset at 0x20f7567bfd0>}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lifelong_datasets # four splits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['flowers', 'small_mammals', 'trees', 'aquatic_mammals', 'fruit_and_vegetables', 'people', 'food_containers', 'vehicles', 'large_carnivores', 'insects'], ['television', 'spider', 'shrew', 'mountain', 'hamster'], ['road', 'poppy', 'household_furniture', 'woman', 'bee']]\n"
     ]
    }
   ],
   "source": [
    "print(tasks[:3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'flowers': 0, 'small_mammals': 1, 'trees': 2, 'aquatic_mammals': 3, 'fruit_and_vegetables': 4, 'people': 5, 'food_containers': 6, 'vehicles': 7, 'large_carnivores': 8, 'insects': 9, 'television': 10, 'spider': 11, 'shrew': 12, 'mountain': 13, 'hamster': 14, 'road': 15, 'poppy': 16, 'household_furniture': 17, 'woman': 18, 'bee': 19, 'tulip': 20, 'clock': 21, 'orange': 22, 'beaver': 23, 'rocket': 24, 'bicycle': 25, 'can': 26, 'squirrel': 27, 'wardrobe': 28, 'bus': 29, 'whale': 30, 'sweet_pepper': 31, 'telephone': 32, 'leopard': 33, 'bowl': 34, 'skyscraper': 35, 'baby': 36, 'cockroach': 37, 'boy': 38, 'lobster': 39, 'motorcycle': 40, 'forest': 41, 'tank': 42, 'orchid': 43, 'chair': 44, 'crab': 45, 'girl': 46, 'keyboard': 47, 'otter': 48, 'bed': 49, 'butterfly': 50, 'lawn_mower': 51, 'snail': 52, 'caterpillar': 53, 'wolf': 54, 'pear': 55, 'tiger': 56, 'pickup_truck': 57, 'cup': 58, 'reptiles': 59, 'train': 60, 'sunflower': 61, 'beetle': 62, 'apple': 63, 'palm_tree': 64, 'plain': 65, 'large_omnivores_and_herbivores': 66, 'rose': 67, 'tractor': 68, 'crocodile': 69, 'mushroom': 70, 'couch': 71, 'lamp': 72, 'mouse': 73, 'bridge': 74, 'turtle': 75, 'willow_tree': 76, 'man': 77, 'lizard': 78, 'maple_tree': 79, 'lion': 80, 'elephant': 81, 'seal': 82, 'sea': 83, 'dinosaur': 84, 'worm': 85, 'bear': 86, 'castle': 87, 'plate': 88, 'dolphin': 89, 'medium_sized_mammals': 90, 'streetcar': 91, 'bottle': 92, 'kangaroo': 93, 'snake': 94, 'house': 95, 'chimpanzee': 96, 'raccoon': 97, 'porcupine': 98, 'oak_tree': 99, 'pine_tree': 100, 'possum': 101, 'skunk': 102, 'fish': 103, 'fox': 104, 'cattle': 105, 'ray': 106, 'aquarium_fish': 107, 'cloud': 108, 'flatfish': 109, 'rabbit': 110, 'trout': 111, 'camel': 112, 'table': 113, 'shark': 114}\n"
     ]
    }
   ],
   "source": [
    "print(class_names_to_idx)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*lifelong_datasets* has four splits, where *train* is for training, *intask_valid* is for validation during task training (in case of IIRC setup, this split is using *incomplete information* like the *train* split), *posttask_valid* is for validation after each task training (in case of IIRC setup, this split is using *complete information* like the *test* split), and finally the *test* split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
