/userhome/miniconda3/envs/mae/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
| distributed init (rank 2): env://, gpu 2
| distributed init (rank 4): env://, gpu 4
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 7): env://, gpu 7
| distributed init (rank 3): env://, gpu 3
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 5): env://, gpu 5
| distributed init (rank 6): env://, gpu 6
[14:50:47.934444] job dir: /userhome/mae-tmp
[14:50:47.934595] Namespace(aa='rand-m9-mstd0.5-inc1',
accum_iter=2,
batch_size=32,
blr=0.0006,
clip_grad=None,
color_jitter=None,
cutmix=0,
cutmix_minmax=None,
data_path='/dataset/ImageNet2012',
device='cuda',
dist_backend='nccl',
dist_eval=False,
dist_on_itp=False,
dist_url='env://',
distributed=True,
drop_path=0.1,
epochs=200,
eval=False,
finetune='',
global_pool=True,
gpu=0,
input_size=224,
layer_decay=1.0,
local_rank=0,
log_dir='./output_dir_qkformer',
lr=None,
min_lr=1e-06,
mixup=0,
mixup_mode='batch',
mixup_prob=1.0,
mixup_switch_prob=0.5,
model='QKFormer_10_512',
nb_classes=1000,
num_workers=10,
output_dir='./output_dir_qkformer',
pin_mem=True,
rank=0,
recount=1,
remode='pixel',
reprob=0.25,
resplit=False,
resume='',
seed=0,
smoothing=0.1,
start_epoch=0,
time_step=4,
warmup_epochs=5,
weight_decay=0.05,
world_size=8)
[14:50:55.608152] Dataset ImageFolder
    Number of datapoints: 1281167
    Root location: /dataset/ImageNet2012/train
    StandardTransform
Transform: Compose(
               RandomResizedCropAndInterpolation(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=PIL.Image.BICUBIC)
               RandomHorizontalFlip(p=0.5)
               <timm.data.auto_augment.RandAugment object at 0x7f5a77997c70>
               ToTensor()
               Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
               <timm.data.random_erasing.RandomErasing object at 0x7f5a77997f10>
           )
[14:50:57.323101] Dataset ImageFolder
    Number of datapoints: 50000
    Root location: /dataset/ImageNet2012/val
    StandardTransform
Transform: Compose(
               Resize(size=256, interpolation=bicubic, max_size=None, antialias=None)
               CenterCrop(size=(224, 224))
               ToTensor()
               Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
           )
[14:50:57.323675] Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7f5a779979d0>
[14:50:57.768609] Model = spiking_transformer(
  (patch_embed1): PatchEmbedInit(
    (proj_conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj_bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj1_conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj1_bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj1_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj1_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj2_conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj2_bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj2_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj_res_conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (proj_res_bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_res_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
  )
  (patch_embed2): PatchEmbeddingStage(
    (proj3_conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj3_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj3_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj3_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj4_conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj4_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj4_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj_res_conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (proj_res_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_res_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
  )
  (patch_embed3): PatchEmbeddingStage(
    (proj3_conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj3_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj3_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj3_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj4_conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj4_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj4_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj_res_conv): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (proj_res_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_res_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
  )
  (stage1): ModuleList(
    (0): TokenSpikingTransformer(
      (tssa): Token_QK_Attention(
        (q_conv): Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(128, 128, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(128, 128, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
  )
  (stage2): ModuleList(
    (0): TokenSpikingTransformer(
      (tssa): Token_QK_Attention(
        (q_conv): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (1): TokenSpikingTransformer(
      (tssa): Token_QK_Attention(
        (q_conv): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(256, 256, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
  )
  (stage3): ModuleList(
    (0): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (1): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (2): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (3): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (4): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (5): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (6): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(512, 512, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
  )
  (head): Linear(in_features=512, out_features=1000, bias=True)
)
[14:50:57.768686] number of params (M): 29.08
[14:50:57.768712] base lr: 6.00e-04
[14:50:57.768727] actual lr: 1.20e-03
[14:50:57.768740] accumulate grad iterations: 2
[14:50:57.768751] effective batch size: 512
[14:50:57.815674] criterion = LabelSmoothingCrossEntropy()
[14:50:57.815739] Start training for 200 epochs
[14:50:57.818347] log_dir: ./output_dir_qkformer
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[14:51:21.385243] Epoch: [0]  [   0/5004]  eta: 1 day, 8:45:16  lr: 0.000000  loss: 7.0389 (7.0389)  time: 23.5645  data: 2.3668  max mem: 24440
[15:06:35.250831] Epoch: [0]  [2000/5004]  eta: 0:23:27  lr: 0.000096  loss: 6.5945 (6.8059)  time: 0.4604  data: 0.0002  max mem: 24440
[15:21:48.744690] Epoch: [0]  [4000/5004]  eta: 0:07:44  lr: 0.000192  loss: 6.2447 (6.5816)  time: 0.4561  data: 0.0002  max mem: 24440
[15:29:27.050896] Epoch: [0]  [5003/5004]  eta: 0:00:00  lr: 0.000240  loss: 5.9660 (6.4782)  time: 0.4563  data: 0.0009  max mem: 24440
[15:29:27.538755] Epoch: [0] Total time: 0:38:29 (0.4616 s / it)
[15:29:27.540032] Averaged stats: lr: 0.000240  loss: 5.9660 (6.4777)
[15:29:30.246456] Test:  [   0/1563]  eta: 1:10:20  loss: 6.1437 (6.1437)  acc1: 0.0000 (0.0000)  acc5: 9.3750 (9.3750)  time: 2.7005  data: 2.0997  max mem: 24440
[15:30:54.035058] Test:  [ 500/1563]  eta: 0:03:03  loss: 5.2592 (5.3679)  acc1: 0.0000 (4.0045)  acc5: 0.0000 (14.0594)  time: 0.1675  data: 0.0002  max mem: 24440
[15:32:17.831135] Test:  [1000/1563]  eta: 0:01:35  loss: 5.6150 (5.4021)  acc1: 3.1250 (4.5236)  acc5: 12.5000 (14.9382)  time: 0.1674  data: 0.0002  max mem: 24440
[15:33:41.666474] Test:  [1500/1563]  eta: 0:00:10  loss: 4.5433 (5.3917)  acc1: 3.1250 (5.1299)  acc5: 34.3750 (16.1226)  time: 0.1679  data: 0.0002  max mem: 24440
[15:33:54.459687] Test:  [1562/1563]  eta: 0:00:00  loss: 5.0716 (5.3731)  acc1: 0.0000 (5.3680)  acc5: 6.2500 (16.4440)  time: 0.2874  data: 0.0001  max mem: 24440
[15:33:54.695462] Test: Total time: 0:04:27 (0.1709 s / it)
[15:33:54.697355] * Acc@1 5.368 Acc@5 16.443 loss 5.373
[15:33:54.697581] Accuracy of the network on the 50000 test images: 5.4%
[15:33:54.697625] Max accuracy: 5.37%
[15:33:55.001444] log_dir: ./output_dir_qkformer
[15:34:03.295824] Epoch: [1]  [   0/5004]  eta: 11:30:17  lr: 0.000240  loss: 5.8708 (5.8708)  time: 8.2769  data: 2.3504  max mem: 24440
[15:49:18.327238] Epoch: [1]  [2000/5004]  eta: 0:23:06  lr: 0.000336  loss: 5.6358 (5.8161)  time: 0.4555  data: 0.0002  max mem: 24440
[16:04:32.877057] Epoch: [1]  [4000/5004]  eta: 0:07:41  lr: 0.000432  loss: 5.3470 (5.6670)  time: 0.4550  data: 0.0002  max mem: 24440
[16:12:11.421607] Epoch: [1]  [5003/5004]  eta: 0:00:00  lr: 0.000480  loss: 5.2278 (5.6008)  time: 0.4583  data: 0.0009  max mem: 24440
[16:12:11.907214] Epoch: [1] Total time: 0:38:16 (0.4590 s / it)
[16:12:11.908698] Averaged stats: lr: 0.000480  loss: 5.2278 (5.5962)
[16:12:14.624138] Test:  [   0/1563]  eta: 1:10:34  loss: 3.0033 (3.0033)  acc1: 53.1250 (53.1250)  acc5: 71.8750 (71.8750)  time: 2.7094  data: 2.2074  max mem: 24440
[16:13:38.461377] Test:  [ 500/1563]  eta: 0:03:03  loss: 4.0441 (3.8382)  acc1: 9.3750 (17.8830)  acc5: 34.3750 (43.1574)  time: 0.1675  data: 0.0002  max mem: 24440
[16:15:02.317597] Test:  [1000/1563]  eta: 0:01:35  loss: 3.9266 (4.0783)  acc1: 3.1250 (16.9424)  acc5: 37.5000 (39.7540)  time: 0.1678  data: 0.0002  max mem: 24440
[16:16:26.169325] Test:  [1500/1563]  eta: 0:00:10  loss: 2.9990 (4.2089)  acc1: 34.3750 (15.7749)  acc5: 62.5000 (37.2918)  time: 0.1675  data: 0.0002  max mem: 24440
[16:16:36.489639] Test:  [1562/1563]  eta: 0:00:00  loss: 3.8750 (4.1918)  acc1: 12.5000 (16.1480)  acc5: 50.0000 (37.7060)  time: 0.1632  data: 0.0001  max mem: 24440
[16:16:36.602152] Test: Total time: 0:04:24 (0.1693 s / it)
[16:16:37.084638] * Acc@1 16.148 Acc@5 37.706 loss 4.192
[16:16:37.084794] Accuracy of the network on the 50000 test images: 16.1%
[16:16:37.084819] Max accuracy: 16.15%
[16:16:37.173801] log_dir: ./output_dir_qkformer
[16:16:40.059928] Epoch: [2]  [   0/5004]  eta: 4:00:34  lr: 0.000480  loss: 5.4274 (5.4274)  time: 2.8847  data: 2.0239  max mem: 24440
[16:31:55.783416] Epoch: [2]  [2000/5004]  eta: 0:22:58  lr: 0.000576  loss: 5.1327 (5.1536)  time: 0.4578  data: 0.0002  max mem: 24440
[16:47:10.878045] Epoch: [2]  [4000/5004]  eta: 0:07:40  lr: 0.000672  loss: 4.7047 (5.0789)  time: 0.4599  data: 0.0002  max mem: 24440
[16:54:49.883643] Epoch: [2]  [5003/5004]  eta: 0:00:00  lr: 0.000720  loss: 4.7044 (5.0382)  time: 0.4554  data: 0.0006  max mem: 24440
[16:54:50.329560] Epoch: [2] Total time: 0:38:13 (0.4583 s / it)
[16:54:50.340955] Averaged stats: lr: 0.000720  loss: 4.7044 (5.0431)
[16:54:52.231699] Test:  [   0/1563]  eta: 0:49:07  loss: 1.9649 (1.9649)  acc1: 65.6250 (65.6250)  acc5: 84.3750 (84.3750)  time: 1.8856  data: 1.5683  max mem: 24440
[16:56:16.187514] Test:  [ 500/1563]  eta: 0:03:02  loss: 3.4540 (3.0448)  acc1: 15.6250 (30.1023)  acc5: 53.1250 (59.5372)  time: 0.1680  data: 0.0002  max mem: 24440
[16:57:40.092501] Test:  [1000/1563]  eta: 0:01:35  loss: 3.1734 (3.2840)  acc1: 31.2500 (28.4715)  acc5: 59.3750 (55.7099)  time: 0.1680  data: 0.0002  max mem: 24440
[16:59:03.987420] Test:  [1500/1563]  eta: 0:00:10  loss: 2.4823 (3.4229)  acc1: 40.6250 (27.1444)  acc5: 71.8750 (53.4102)  time: 0.1678  data: 0.0002  max mem: 24440
[16:59:14.297597] Test:  [1562/1563]  eta: 0:00:00  loss: 2.0371 (3.3993)  acc1: 46.8750 (27.6020)  acc5: 78.1250 (53.8320)  time: 0.1633  data: 0.0001  max mem: 24440
[16:59:14.425639] Test: Total time: 0:04:24 (0.1690 s / it)
[16:59:14.975084] * Acc@1 27.602 Acc@5 53.832 loss 3.399
[16:59:14.975412] Accuracy of the network on the 50000 test images: 27.6%
[16:59:14.975465] Max accuracy: 27.60%
[16:59:15.044343] log_dir: ./output_dir_qkformer
[16:59:17.592492] Epoch: [3]  [   0/5004]  eta: 3:32:26  lr: 0.000720  loss: 5.0575 (5.0575)  time: 2.5473  data: 2.0347  max mem: 24440
[17:14:33.512537] Epoch: [3]  [2000/5004]  eta: 0:22:58  lr: 0.000816  loss: 4.6036 (4.7501)  time: 0.4556  data: 0.0003  max mem: 24440
[17:29:50.743283] Epoch: [3]  [4000/5004]  eta: 0:07:40  lr: 0.000912  loss: 4.4467 (4.6843)  time: 0.4566  data: 0.0002  max mem: 24440
[17:37:30.379597] Epoch: [3]  [5003/5004]  eta: 0:00:00  lr: 0.000960  loss: 4.6113 (4.6529)  time: 0.4547  data: 0.0006  max mem: 24440
[17:37:30.791244] Epoch: [3] Total time: 0:38:15 (0.4588 s / it)
[17:37:30.798651] Averaged stats: lr: 0.000960  loss: 4.6113 (4.6526)
[17:37:32.754565] Test:  [   0/1563]  eta: 0:50:48  loss: 3.0273 (3.0273)  acc1: 40.6250 (40.6250)  acc5: 71.8750 (71.8750)  time: 1.9502  data: 1.7749  max mem: 24440
[17:38:56.652349] Test:  [ 500/1563]  eta: 0:03:02  loss: 2.8846 (2.6860)  acc1: 28.1250 (36.7078)  acc5: 62.5000 (67.5025)  time: 0.1677  data: 0.0002  max mem: 24440
[17:40:20.568913] Test:  [1000/1563]  eta: 0:01:35  loss: 3.1026 (2.8576)  acc1: 31.2500 (35.7486)  acc5: 62.5000 (64.2826)  time: 0.1677  data: 0.0002  max mem: 24440
[17:41:44.446273] Test:  [1500/1563]  eta: 0:00:10  loss: 2.1572 (2.9671)  acc1: 50.0000 (34.2397)  acc5: 75.0000 (62.2418)  time: 0.1678  data: 0.0002  max mem: 24440
[17:41:54.767636] Test:  [1562/1563]  eta: 0:00:00  loss: 1.7987 (2.9512)  acc1: 53.1250 (34.6300)  acc5: 81.2500 (62.5520)  time: 0.1636  data: 0.0001  max mem: 24440
[17:41:54.881528] Test: Total time: 0:04:24 (0.1690 s / it)
[17:41:55.313797] * Acc@1 34.630 Acc@5 62.552 loss 2.951
[17:41:55.313974] Accuracy of the network on the 50000 test images: 34.6%
[17:41:55.313994] Max accuracy: 34.63%
[17:41:55.415566] log_dir: ./output_dir_qkformer
[17:41:58.148859] Epoch: [4]  [   0/5004]  eta: 3:47:47  lr: 0.000960  loss: 4.6090 (4.6090)  time: 2.7314  data: 2.0222  max mem: 24440
[17:57:16.424079] Epoch: [4]  [2000/5004]  eta: 0:23:02  lr: 0.001056  loss: 4.2549 (4.4392)  time: 0.4565  data: 0.0003  max mem: 24440
[18:12:34.812894] Epoch: [4]  [4000/5004]  eta: 0:07:41  lr: 0.001152  loss: 4.1854 (4.4033)  time: 0.4600  data: 0.0002  max mem: 24440
[18:20:14.686423] Epoch: [4]  [5003/5004]  eta: 0:00:00  lr: 0.001200  loss: 4.1639 (4.3827)  time: 0.4559  data: 0.0011  max mem: 24440
[18:20:15.147642] Epoch: [4] Total time: 0:38:19 (0.4596 s / it)
[18:20:15.184623] Averaged stats: lr: 0.001200  loss: 4.1639 (4.3821)
[18:20:17.458548] Test:  [   0/1563]  eta: 0:59:03  loss: 1.6130 (1.6130)  acc1: 62.5000 (62.5000)  acc5: 87.5000 (87.5000)  time: 2.2669  data: 2.0858  max mem: 24440
[18:21:41.386403] Test:  [ 500/1563]  eta: 0:03:02  loss: 2.6419 (2.4001)  acc1: 34.3750 (42.7520)  acc5: 68.7500 (72.8293)  time: 0.1678  data: 0.0003  max mem: 24440
[18:23:05.307921] Test:  [1000/1563]  eta: 0:01:35  loss: 2.7093 (2.6030)  acc1: 34.3750 (40.6312)  acc5: 68.7500 (69.2589)  time: 0.1679  data: 0.0002  max mem: 24440
[18:24:29.252941] Test:  [1500/1563]  eta: 0:00:10  loss: 1.8446 (2.7401)  acc1: 62.5000 (38.9386)  acc5: 81.2500 (66.7701)  time: 0.1679  data: 0.0002  max mem: 24440
[18:24:39.571164] Test:  [1562/1563]  eta: 0:00:00  loss: 1.4850 (2.7312)  acc1: 59.3750 (39.2420)  acc5: 87.5000 (66.9680)  time: 0.1635  data: 0.0001  max mem: 24440
[18:24:39.692512] Test: Total time: 0:04:24 (0.1692 s / it)
[18:24:40.232559] * Acc@1 39.240 Acc@5 66.968 loss 2.731
[18:24:40.232742] Accuracy of the network on the 50000 test images: 39.2%
[18:24:40.232768] Max accuracy: 39.24%
[18:24:40.366133] log_dir: ./output_dir_qkformer
[18:24:44.048507] Epoch: [5]  [   0/5004]  eta: 5:06:56  lr: 0.001200  loss: 4.3070 (4.3070)  time: 3.6803  data: 2.2048  max mem: 24440
[18:40:03.915116] Epoch: [5]  [2000/5004]  eta: 0:23:06  lr: 0.001200  loss: 4.1486 (4.2324)  time: 0.4591  data: 0.0002  max mem: 24440
[18:55:23.340367] Epoch: [5]  [4000/5004]  eta: 0:07:42  lr: 0.001200  loss: 4.1698 (4.1933)  time: 0.4585  data: 0.0002  max mem: 24440
[19:03:04.439495] Epoch: [5]  [5003/5004]  eta: 0:00:00  lr: 0.001200  loss: 3.8699 (4.1707)  time: 0.4557  data: 0.0005  max mem: 24440
[19:03:04.893318] Epoch: [5] Total time: 0:38:24 (0.4605 s / it)
[19:03:04.913321] Averaged stats: lr: 0.001200  loss: 3.8699 (4.1655)
[19:03:06.542383] Test:  [   0/1563]  eta: 0:42:19  loss: 1.3624 (1.3624)  acc1: 81.2500 (81.2500)  acc5: 90.6250 (90.6250)  time: 1.6247  data: 1.4507  max mem: 24440
[19:04:30.504819] Test:  [ 500/1563]  eta: 0:03:01  loss: 2.2801 (2.1319)  acc1: 40.6250 (48.8273)  acc5: 78.1250 (78.1687)  time: 0.1682  data: 0.0002  max mem: 24440
[19:05:54.444454] Test:  [1000/1563]  eta: 0:01:35  loss: 2.5151 (2.3297)  acc1: 40.6250 (46.3162)  acc5: 71.8750 (74.3444)  time: 0.1678  data: 0.0002  max mem: 24440
[19:07:18.333044] Test:  [1500/1563]  eta: 0:00:10  loss: 1.5590 (2.4596)  acc1: 68.7500 (44.5620)  acc5: 87.5000 (72.0478)  time: 0.1677  data: 0.0002  max mem: 24440
[19:07:28.654116] Test:  [1562/1563]  eta: 0:00:00  loss: 1.3924 (2.4478)  acc1: 65.6250 (44.7860)  acc5: 84.3750 (72.2380)  time: 0.1634  data: 0.0001  max mem: 24440
[19:07:28.767581] Test: Total time: 0:04:23 (0.1688 s / it)
[19:07:29.445032] * Acc@1 44.786 Acc@5 72.238 loss 2.448
[19:07:29.445184] Accuracy of the network on the 50000 test images: 44.8%
[19:07:29.445206] Max accuracy: 44.79%
[19:07:29.706207] log_dir: ./output_dir_qkformer
[19:07:33.195989] Epoch: [6]  [   0/5004]  eta: 4:50:30  lr: 0.001200  loss: 4.2202 (4.2202)  time: 3.4834  data: 2.7970  max mem: 24440
[19:22:53.990188] Epoch: [6]  [2000/5004]  eta: 0:23:07  lr: 0.001200  loss: 4.0080 (4.0248)  time: 0.4615  data: 0.0002  max mem: 24440
[19:38:13.784744] Epoch: [6]  [4000/5004]  eta: 0:07:42  lr: 0.001200  loss: 3.9888 (3.9988)  time: 0.4582  data: 0.0002  max mem: 24440
[19:45:54.956764] Epoch: [6]  [5003/5004]  eta: 0:00:00  lr: 0.001200  loss: 3.8860 (3.9838)  time: 0.4595  data: 0.0005  max mem: 24440
[19:45:55.419042] Epoch: [6] Total time: 0:38:25 (0.4608 s / it)
[19:45:55.423047] Averaged stats: lr: 0.001200  loss: 3.8860 (3.9846)
[19:45:57.276092] Test:  [   0/1563]  eta: 0:48:09  loss: 0.8829 (0.8829)  acc1: 84.3750 (84.3750)  acc5: 93.7500 (93.7500)  time: 1.8486  data: 1.5938  max mem: 24440
[19:47:21.254557] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.5325 (1.8705)  acc1: 50.0000 (53.8922)  acc5: 81.2500 (82.4102)  time: 0.1678  data: 0.0002  max mem: 24440
[19:48:45.223735] Test:  [1000/1563]  eta: 0:01:35  loss: 2.2902 (2.0888)  acc1: 40.6250 (50.8679)  acc5: 75.0000 (78.4309)  time: 0.1678  data: 0.0002  max mem: 24440
[19:50:09.187464] Test:  [1500/1563]  eta: 0:00:10  loss: 1.3240 (2.2011)  acc1: 65.6250 (49.1672)  acc5: 90.6250 (76.2846)  time: 0.1679  data: 0.0002  max mem: 24440
[19:50:19.507356] Test:  [1562/1563]  eta: 0:00:00  loss: 1.3240 (2.1965)  acc1: 71.8750 (49.2580)  acc5: 87.5000 (76.3720)  time: 0.1635  data: 0.0001  max mem: 24440
[19:50:19.627063] Test: Total time: 0:04:24 (0.1690 s / it)
[19:50:20.127597] * Acc@1 49.260 Acc@5 76.373 loss 2.196
[19:50:20.127768] Accuracy of the network on the 50000 test images: 49.3%
[19:50:20.127795] Max accuracy: 49.26%
[19:50:20.200371] log_dir: ./output_dir_qkformer
[19:50:22.934452] Epoch: [7]  [   0/5004]  eta: 3:47:57  lr: 0.001200  loss: 4.0111 (4.0111)  time: 2.7334  data: 2.0355  max mem: 24440
[20:05:43.103626] Epoch: [7]  [2000/5004]  eta: 0:23:05  lr: 0.001200  loss: 3.8957 (3.8713)  time: 0.4569  data: 0.0002  max mem: 24440
[20:21:03.243876] Epoch: [7]  [4000/5004]  eta: 0:07:42  lr: 0.001199  loss: 3.9064 (3.8546)  time: 0.4571  data: 0.0002  max mem: 24440
[20:28:44.084769] Epoch: [7]  [5003/5004]  eta: 0:00:00  lr: 0.001199  loss: 3.7683 (3.8428)  time: 0.4537  data: 0.0006  max mem: 24440
[20:28:44.487485] Epoch: [7] Total time: 0:38:24 (0.4605 s / it)
[20:28:44.490067] Averaged stats: lr: 0.001199  loss: 3.7683 (3.8504)
[20:28:46.081399] Test:  [   0/1563]  eta: 0:41:20  loss: 0.9836 (0.9836)  acc1: 81.2500 (81.2500)  acc5: 90.6250 (90.6250)  time: 1.5868  data: 1.4108  max mem: 24440
[20:30:10.071115] Test:  [ 500/1563]  eta: 0:03:01  loss: 2.0030 (1.6593)  acc1: 40.6250 (57.9653)  acc5: 78.1250 (85.2046)  time: 0.1680  data: 0.0002  max mem: 24440
[20:31:34.007594] Test:  [1000/1563]  eta: 0:01:35  loss: 1.9500 (1.8918)  acc1: 50.0000 (54.7047)  acc5: 78.1250 (81.1001)  time: 0.1680  data: 0.0002  max mem: 24440
[20:32:57.936065] Test:  [1500/1563]  eta: 0:00:10  loss: 1.4268 (2.0298)  acc1: 65.6250 (52.4421)  acc5: 84.3750 (78.7100)  time: 0.1677  data: 0.0002  max mem: 24440
[20:33:08.253470] Test:  [1562/1563]  eta: 0:00:00  loss: 1.0135 (2.0213)  acc1: 81.2500 (52.6540)  acc5: 90.6250 (78.8260)  time: 0.1635  data: 0.0001  max mem: 24440
[20:33:08.373421] Test: Total time: 0:04:23 (0.1688 s / it)
[20:33:09.032788] * Acc@1 52.656 Acc@5 78.826 loss 2.021
[20:33:09.032932] Accuracy of the network on the 50000 test images: 52.7%
[20:33:09.032953] Max accuracy: 52.66%
[20:33:09.121510] log_dir: ./output_dir_qkformer
[20:33:11.896883] Epoch: [8]  [   0/5004]  eta: 3:51:15  lr: 0.001199  loss: 3.9788 (3.9788)  time: 2.7729  data: 2.3163  max mem: 24440
[20:48:32.470370] Epoch: [8]  [2000/5004]  eta: 0:23:06  lr: 0.001199  loss: 3.8808 (3.7732)  time: 0.4657  data: 0.0003  max mem: 24440
[21:03:52.306286] Epoch: [8]  [4000/5004]  eta: 0:07:42  lr: 0.001199  loss: 3.7080 (3.7640)  time: 0.4583  data: 0.0002  max mem: 24440
[21:11:33.759168] Epoch: [8]  [5003/5004]  eta: 0:00:00  lr: 0.001199  loss: 3.8287 (3.7566)  time: 0.4544  data: 0.0009  max mem: 24440
[21:11:34.250823] Epoch: [8] Total time: 0:38:25 (0.4607 s / it)
[21:11:34.252273] Averaged stats: lr: 0.001199  loss: 3.8287 (3.7488)
[21:11:35.949816] Test:  [   0/1563]  eta: 0:44:05  loss: 0.6842 (0.6842)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.6927  data: 1.5173  max mem: 24440
[21:12:59.918163] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.3344 (1.5803)  acc1: 59.3750 (60.7473)  acc5: 90.6250 (86.6766)  time: 0.1688  data: 0.0009  max mem: 24440
[21:14:23.845185] Test:  [1000/1563]  eta: 0:01:35  loss: 2.0959 (1.8278)  acc1: 50.0000 (56.6028)  acc5: 81.2500 (82.2646)  time: 0.1681  data: 0.0002  max mem: 24440
[21:15:47.763081] Test:  [1500/1563]  eta: 0:00:10  loss: 1.3711 (1.9444)  acc1: 71.8750 (54.5595)  acc5: 84.3750 (80.2215)  time: 0.1677  data: 0.0002  max mem: 24440
[21:15:58.084902] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8446 (1.9367)  acc1: 81.2500 (54.7520)  acc5: 93.7500 (80.3280)  time: 0.1634  data: 0.0001  max mem: 24440
[21:15:58.187141] Test: Total time: 0:04:23 (0.1689 s / it)
[21:15:58.956393] * Acc@1 54.751 Acc@5 80.328 loss 1.937
[21:15:58.956547] Accuracy of the network on the 50000 test images: 54.8%
[21:15:58.956571] Max accuracy: 54.75%
[21:15:59.069509] log_dir: ./output_dir_qkformer
[21:16:01.823527] Epoch: [9]  [   0/5004]  eta: 3:49:33  lr: 0.001199  loss: 4.0866 (4.0866)  time: 2.7525  data: 2.2374  max mem: 24440
[21:31:23.228678] Epoch: [9]  [2000/5004]  eta: 0:23:07  lr: 0.001198  loss: 3.5991 (3.6906)  time: 0.4580  data: 0.0002  max mem: 24440
[21:46:46.392470] Epoch: [9]  [4000/5004]  eta: 0:07:43  lr: 0.001198  loss: 3.7198 (3.6799)  time: 0.4616  data: 0.0002  max mem: 24440
[21:54:27.958677] Epoch: [9]  [5003/5004]  eta: 0:00:00  lr: 0.001198  loss: 3.5648 (3.6771)  time: 0.4554  data: 0.0009  max mem: 24440
[21:54:28.347398] Epoch: [9] Total time: 0:38:29 (0.4615 s / it)
[21:54:28.352586] Averaged stats: lr: 0.001198  loss: 3.5648 (3.6715)
[21:54:30.185087] Test:  [   0/1563]  eta: 0:47:34  loss: 0.7360 (0.7360)  acc1: 84.3750 (84.3750)  acc5: 93.7500 (93.7500)  time: 1.8265  data: 1.6488  max mem: 24440
[21:55:54.119507] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.6385 (1.5462)  acc1: 53.1250 (62.0135)  acc5: 87.5000 (87.4626)  time: 0.1677  data: 0.0002  max mem: 24440
[21:57:18.013152] Test:  [1000/1563]  eta: 0:01:35  loss: 2.0115 (1.7499)  acc1: 46.8750 (58.3604)  acc5: 81.2500 (83.6414)  time: 0.1677  data: 0.0002  max mem: 24440
[21:58:41.929760] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9557 (1.8726)  acc1: 75.0000 (56.1292)  acc5: 93.7500 (81.4561)  time: 0.1677  data: 0.0002  max mem: 24440
[21:58:52.241310] Test:  [1562/1563]  eta: 0:00:00  loss: 1.0335 (1.8692)  acc1: 78.1250 (56.1480)  acc5: 93.7500 (81.5060)  time: 0.1634  data: 0.0001  max mem: 24440
[21:58:52.369068] Test: Total time: 0:04:24 (0.1689 s / it)
[21:58:52.920539] * Acc@1 56.150 Acc@5 81.506 loss 1.869
[21:58:52.920691] Accuracy of the network on the 50000 test images: 56.1%
[21:58:52.920715] Max accuracy: 56.15%
[21:58:52.975488] log_dir: ./output_dir_qkformer
[21:58:55.941398] Epoch: [10]  [   0/5004]  eta: 4:07:16  lr: 0.001198  loss: 3.4945 (3.4945)  time: 2.9649  data: 2.2447  max mem: 24440
[22:14:15.257154] Epoch: [10]  [2000/5004]  eta: 0:23:04  lr: 0.001198  loss: 3.6073 (3.6259)  time: 0.4606  data: 0.0002  max mem: 24440
[22:29:35.431710] Epoch: [10]  [4000/5004]  eta: 0:07:42  lr: 0.001197  loss: 3.5036 (3.6160)  time: 0.4573  data: 0.0002  max mem: 24440
[22:37:16.268767] Epoch: [10]  [5003/5004]  eta: 0:00:00  lr: 0.001197  loss: 3.4987 (3.6070)  time: 0.4536  data: 0.0009  max mem: 24440
[22:37:16.670385] Epoch: [10] Total time: 0:38:23 (0.4604 s / it)
[22:37:16.676608] Averaged stats: lr: 0.001197  loss: 3.4987 (3.6112)
[22:37:18.751502] Test:  [   0/1563]  eta: 0:53:55  loss: 0.7866 (0.7866)  acc1: 87.5000 (87.5000)  acc5: 93.7500 (93.7500)  time: 2.0702  data: 1.8955  max mem: 24440
[22:38:42.675031] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.4370 (1.4811)  acc1: 59.3750 (62.6497)  acc5: 90.6250 (87.9179)  time: 0.1677  data: 0.0002  max mem: 24440
[22:40:06.565275] Test:  [1000/1563]  eta: 0:01:35  loss: 1.8180 (1.6869)  acc1: 56.2500 (59.0222)  acc5: 84.3750 (84.4281)  time: 0.1679  data: 0.0002  max mem: 24440
[22:41:30.461299] Test:  [1500/1563]  eta: 0:00:10  loss: 1.2218 (1.8031)  acc1: 71.8750 (56.9391)  acc5: 90.6250 (82.5096)  time: 0.1679  data: 0.0002  max mem: 24440
[22:41:40.782066] Test:  [1562/1563]  eta: 0:00:00  loss: 0.9176 (1.7993)  acc1: 71.8750 (57.0380)  acc5: 90.6250 (82.5660)  time: 0.1634  data: 0.0001  max mem: 24440
[22:41:40.889602] Test: Total time: 0:04:24 (0.1690 s / it)
[22:41:41.572083] * Acc@1 57.036 Acc@5 82.566 loss 1.799
[22:41:41.572227] Accuracy of the network on the 50000 test images: 57.0%
[22:41:41.572248] Max accuracy: 57.04%
[22:41:41.679084] log_dir: ./output_dir_qkformer
[22:41:44.305702] Epoch: [11]  [   0/5004]  eta: 3:38:54  lr: 0.001197  loss: 3.6163 (3.6163)  time: 2.6248  data: 2.0583  max mem: 24440
[22:57:06.162819] Epoch: [11]  [2000/5004]  eta: 0:23:07  lr: 0.001197  loss: 3.6127 (3.5689)  time: 0.4597  data: 0.0002  max mem: 24440
[23:12:26.840527] Epoch: [11]  [4000/5004]  eta: 0:07:43  lr: 0.001196  loss: 3.6531 (3.5638)  time: 0.4644  data: 0.0002  max mem: 24440
[23:20:08.283162] Epoch: [11]  [5003/5004]  eta: 0:00:00  lr: 0.001196  loss: 3.5433 (3.5605)  time: 0.4573  data: 0.0005  max mem: 24440
[23:20:08.657890] Epoch: [11] Total time: 0:38:26 (0.4610 s / it)
[23:20:08.658771] Averaged stats: lr: 0.001196  loss: 3.5433 (3.5598)
[23:20:10.158135] Test:  [   0/1563]  eta: 0:38:57  loss: 0.7120 (0.7120)  acc1: 87.5000 (87.5000)  acc5: 93.7500 (93.7500)  time: 1.4955  data: 1.3142  max mem: 24440
[23:21:34.111868] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.5893 (1.4465)  acc1: 53.1250 (63.3733)  acc5: 87.5000 (88.9845)  time: 0.1680  data: 0.0002  max mem: 24440
[23:22:58.070033] Test:  [1000/1563]  eta: 0:01:35  loss: 1.8895 (1.6264)  acc1: 46.8750 (60.5488)  acc5: 78.1250 (85.6487)  time: 0.1677  data: 0.0002  max mem: 24440
[23:24:22.048812] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9595 (1.7432)  acc1: 75.0000 (58.4777)  acc5: 93.7500 (83.5964)  time: 0.1679  data: 0.0002  max mem: 24440
[23:24:32.371606] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8767 (1.7386)  acc1: 75.0000 (58.6140)  acc5: 93.7500 (83.6840)  time: 0.1634  data: 0.0001  max mem: 24440
[23:24:32.488017] Test: Total time: 0:04:23 (0.1688 s / it)
[23:24:33.275905] * Acc@1 58.615 Acc@5 83.686 loss 1.739
[23:24:33.276046] Accuracy of the network on the 50000 test images: 58.6%
[23:24:33.276069] Max accuracy: 58.62%
[23:24:33.349104] log_dir: ./output_dir_qkformer
[23:24:36.012334] Epoch: [12]  [   0/5004]  eta: 3:41:56  lr: 0.001196  loss: 3.5943 (3.5943)  time: 2.6611  data: 2.1601  max mem: 24440
[23:39:59.221734] Epoch: [12]  [2000/5004]  eta: 0:23:09  lr: 0.001196  loss: 3.4759 (3.5094)  time: 0.4619  data: 0.0003  max mem: 24440
[23:55:22.202189] Epoch: [12]  [4000/5004]  eta: 0:07:43  lr: 0.001195  loss: 3.4955 (3.5119)  time: 0.4609  data: 0.0002  max mem: 24440
[00:03:03.792911] Epoch: [12]  [5003/5004]  eta: 0:00:00  lr: 0.001195  loss: 3.4515 (3.5104)  time: 0.4536  data: 0.0005  max mem: 24440
[00:03:04.192007] Epoch: [12] Total time: 0:38:30 (0.4618 s / it)
[00:03:04.193420] Averaged stats: lr: 0.001195  loss: 3.4515 (3.5103)
[00:03:05.598504] Test:  [   0/1563]  eta: 0:36:30  loss: 0.5289 (0.5289)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.4017  data: 1.2202  max mem: 24440
[00:04:29.675304] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.5162 (1.3801)  acc1: 56.2500 (65.7622)  acc5: 87.5000 (89.8515)  time: 0.1677  data: 0.0002  max mem: 24440
[00:05:53.584767] Test:  [1000/1563]  eta: 0:01:35  loss: 1.8730 (1.5816)  acc1: 56.2500 (62.0817)  acc5: 87.5000 (86.2200)  time: 0.1677  data: 0.0002  max mem: 24440
[00:07:17.504319] Test:  [1500/1563]  eta: 0:00:10  loss: 1.0851 (1.7115)  acc1: 71.8750 (59.6332)  acc5: 90.6250 (84.0669)  time: 0.1677  data: 0.0002  max mem: 24440
[00:07:27.820862] Test:  [1562/1563]  eta: 0:00:00  loss: 1.1207 (1.7102)  acc1: 75.0000 (59.6660)  acc5: 93.7500 (84.1140)  time: 0.1635  data: 0.0001  max mem: 24440
[00:07:27.942150] Test: Total time: 0:04:23 (0.1687 s / it)
[00:07:28.587052] * Acc@1 59.662 Acc@5 84.115 loss 1.710
[00:07:28.587239] Accuracy of the network on the 50000 test images: 59.7%
[00:07:28.587260] Max accuracy: 59.66%
[00:07:28.726290] log_dir: ./output_dir_qkformer
[00:07:31.623022] Epoch: [13]  [   0/5004]  eta: 4:01:27  lr: 0.001195  loss: 3.2873 (3.2873)  time: 2.8951  data: 2.1612  max mem: 24440
[00:22:52.857042] Epoch: [13]  [2000/5004]  eta: 0:23:07  lr: 0.001195  loss: 3.3266 (3.4671)  time: 0.4596  data: 0.0002  max mem: 24440
[00:38:14.031126] Epoch: [13]  [4000/5004]  eta: 0:07:43  lr: 0.001194  loss: 3.5858 (3.4728)  time: 0.4583  data: 0.0002  max mem: 24440
[00:45:56.038316] Epoch: [13]  [5003/5004]  eta: 0:00:00  lr: 0.001194  loss: 3.3705 (3.4709)  time: 0.4542  data: 0.0005  max mem: 24440
[00:45:56.398787] Epoch: [13] Total time: 0:38:27 (0.4612 s / it)
[00:45:56.400743] Averaged stats: lr: 0.001194  loss: 3.3705 (3.4770)
[00:45:58.158917] Test:  [   0/1563]  eta: 0:45:41  loss: 0.3890 (0.3890)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.7540  data: 1.3975  max mem: 24440
[00:47:22.112422] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.1751 (1.4364)  acc1: 65.6250 (63.4419)  acc5: 90.6250 (88.3857)  time: 0.1677  data: 0.0002  max mem: 24440
[00:48:46.043537] Test:  [1000/1563]  eta: 0:01:35  loss: 2.2936 (1.6315)  acc1: 46.8750 (60.3116)  acc5: 78.1250 (84.9900)  time: 0.1679  data: 0.0004  max mem: 24440
[00:50:09.985714] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9941 (1.7369)  acc1: 75.0000 (58.5922)  acc5: 90.6250 (83.3132)  time: 0.1680  data: 0.0002  max mem: 24440
[00:50:20.304806] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6888 (1.7308)  acc1: 84.3750 (58.7100)  acc5: 96.8750 (83.4160)  time: 0.1635  data: 0.0001  max mem: 24440
[00:50:20.421124] Test: Total time: 0:04:24 (0.1689 s / it)
[00:50:20.998600] * Acc@1 58.705 Acc@5 83.418 loss 1.731
[00:50:20.998747] Accuracy of the network on the 50000 test images: 58.7%
[00:50:20.998768] Max accuracy: 59.66%
[00:50:21.104422] log_dir: ./output_dir_qkformer
[00:50:23.865492] Epoch: [14]  [   0/5004]  eta: 3:49:57  lr: 0.001194  loss: 2.6773 (2.6773)  time: 2.7574  data: 2.2919  max mem: 24440
[01:05:46.578906] Epoch: [14]  [2000/5004]  eta: 0:23:09  lr: 0.001193  loss: 3.4690 (3.4440)  time: 0.4567  data: 0.0003  max mem: 24440
[01:21:08.419506] Epoch: [14]  [4000/5004]  eta: 0:07:43  lr: 0.001193  loss: 3.4581 (3.4426)  time: 0.4631  data: 0.0002  max mem: 24440
[01:28:50.291461] Epoch: [14]  [5003/5004]  eta: 0:00:00  lr: 0.001192  loss: 3.4110 (3.4410)  time: 0.4536  data: 0.0009  max mem: 24440
[01:28:50.678640] Epoch: [14] Total time: 0:38:29 (0.4615 s / it)
[01:28:50.679840] Averaged stats: lr: 0.001192  loss: 3.4110 (3.4416)
[01:28:52.236274] Test:  [   0/1563]  eta: 0:40:21  loss: 0.5989 (0.5989)  acc1: 87.5000 (87.5000)  acc5: 93.7500 (93.7500)  time: 1.5491  data: 1.3683  max mem: 24440
[01:30:16.283508] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.3731 (1.2910)  acc1: 59.3750 (67.0284)  acc5: 90.6250 (90.3505)  time: 0.1680  data: 0.0002  max mem: 24440
[01:31:40.272516] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5599 (1.4959)  acc1: 56.2500 (63.6239)  acc5: 87.5000 (86.9131)  time: 0.1680  data: 0.0002  max mem: 24440
[01:33:04.250711] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9650 (1.6177)  acc1: 81.2500 (61.3133)  acc5: 93.7500 (85.0058)  time: 0.1679  data: 0.0002  max mem: 24440
[01:33:14.580360] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7427 (1.6145)  acc1: 84.3750 (61.3820)  acc5: 93.7500 (85.0540)  time: 0.1635  data: 0.0001  max mem: 24440
[01:33:14.700275] Test: Total time: 0:04:24 (0.1689 s / it)
[01:33:15.071679] * Acc@1 61.381 Acc@5 85.056 loss 1.615
[01:33:15.071957] Accuracy of the network on the 50000 test images: 61.4%
[01:33:15.072011] Max accuracy: 61.38%
[01:33:15.163817] log_dir: ./output_dir_qkformer
[01:33:17.784224] Epoch: [15]  [   0/5004]  eta: 3:38:20  lr: 0.001192  loss: 3.0686 (3.0686)  time: 2.6180  data: 2.0874  max mem: 24440
[01:48:39.742451] Epoch: [15]  [2000/5004]  eta: 0:23:07  lr: 0.001192  loss: 3.2851 (3.4218)  time: 0.4616  data: 0.0003  max mem: 24440
[02:04:00.379188] Epoch: [15]  [4000/5004]  eta: 0:07:43  lr: 0.001191  loss: 3.3661 (3.4196)  time: 0.4576  data: 0.0002  max mem: 24440
[02:11:41.969159] Epoch: [15]  [5003/5004]  eta: 0:00:00  lr: 0.001191  loss: 3.2659 (3.4210)  time: 0.4538  data: 0.0008  max mem: 24440
[02:11:42.378853] Epoch: [15] Total time: 0:38:27 (0.4611 s / it)
[02:11:42.674635] Averaged stats: lr: 0.001191  loss: 3.2659 (3.4172)
[02:11:44.769064] Test:  [   0/1563]  eta: 0:54:16  loss: 0.6384 (0.6384)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 2.0834  data: 1.9078  max mem: 24440
[02:13:08.694452] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.4326 (1.3108)  acc1: 62.5000 (66.8850)  acc5: 87.5000 (90.1135)  time: 0.1680  data: 0.0002  max mem: 24440
[02:14:32.623176] Test:  [1000/1563]  eta: 0:01:35  loss: 1.6588 (1.5075)  acc1: 56.2500 (63.4397)  acc5: 90.6250 (87.1316)  time: 0.1677  data: 0.0002  max mem: 24440
[02:15:56.567932] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7820 (1.6294)  acc1: 81.2500 (61.4028)  acc5: 93.7500 (84.9996)  time: 0.1677  data: 0.0002  max mem: 24440
[02:16:06.888179] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8757 (1.6247)  acc1: 84.3750 (61.5020)  acc5: 93.7500 (85.0880)  time: 0.1635  data: 0.0001  max mem: 24440
[02:16:06.993161] Test: Total time: 0:04:24 (0.1691 s / it)
[02:16:07.820329] * Acc@1 61.505 Acc@5 85.086 loss 1.625
[02:16:07.820501] Accuracy of the network on the 50000 test images: 61.5%
[02:16:07.820532] Max accuracy: 61.51%
[02:16:07.882776] log_dir: ./output_dir_qkformer
[02:16:11.041876] Epoch: [16]  [   0/5004]  eta: 4:23:09  lr: 0.001191  loss: 2.8887 (2.8887)  time: 3.1553  data: 2.2257  max mem: 24440
[02:31:32.571900] Epoch: [16]  [2000/5004]  eta: 0:23:08  lr: 0.001190  loss: 3.2688 (3.3995)  time: 0.4577  data: 0.0002  max mem: 24440
[02:46:54.075474] Epoch: [16]  [4000/5004]  eta: 0:07:43  lr: 0.001189  loss: 3.3217 (3.3938)  time: 0.4590  data: 0.0003  max mem: 24440
[02:54:36.092487] Epoch: [16]  [5003/5004]  eta: 0:00:00  lr: 0.001189  loss: 3.4242 (3.3895)  time: 0.4546  data: 0.0008  max mem: 24440
[02:54:36.552459] Epoch: [16] Total time: 0:38:28 (0.4614 s / it)
[02:54:36.559817] Averaged stats: lr: 0.001189  loss: 3.4242 (3.3892)
[02:54:38.278283] Test:  [   0/1563]  eta: 0:44:38  loss: 0.4673 (0.4673)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.7139  data: 1.3871  max mem: 24440
[02:56:02.220636] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.3325 (1.2710)  acc1: 65.6250 (68.2136)  acc5: 90.6250 (91.0429)  time: 0.1679  data: 0.0002  max mem: 24440
[02:57:26.170724] Test:  [1000/1563]  eta: 0:01:35  loss: 1.7419 (1.4780)  acc1: 59.3750 (64.1515)  acc5: 84.3750 (87.7092)  time: 0.1678  data: 0.0002  max mem: 24440
[02:58:50.118318] Test:  [1500/1563]  eta: 0:00:10  loss: 1.1018 (1.5965)  acc1: 78.1250 (62.0274)  acc5: 90.6250 (85.7470)  time: 0.1681  data: 0.0002  max mem: 24440
[02:59:00.441698] Test:  [1562/1563]  eta: 0:00:00  loss: 0.9102 (1.5945)  acc1: 81.2500 (62.1240)  acc5: 93.7500 (85.7900)  time: 0.1634  data: 0.0001  max mem: 24440
[02:59:00.553964] Test: Total time: 0:04:23 (0.1689 s / it)
[02:59:01.125926] * Acc@1 62.123 Acc@5 85.792 loss 1.595
[02:59:01.126106] Accuracy of the network on the 50000 test images: 62.1%
[02:59:01.126128] Max accuracy: 62.12%
[02:59:01.243221] log_dir: ./output_dir_qkformer
[02:59:03.951958] Epoch: [17]  [   0/5004]  eta: 3:45:45  lr: 0.001189  loss: 3.5918 (3.5918)  time: 2.7069  data: 2.2450  max mem: 24440
[03:14:26.928684] Epoch: [17]  [2000/5004]  eta: 0:23:09  lr: 0.001188  loss: 3.3471 (3.3623)  time: 0.4618  data: 0.0003  max mem: 24440
[03:29:50.235681] Epoch: [17]  [4000/5004]  eta: 0:07:43  lr: 0.001187  loss: 3.3731 (3.3661)  time: 0.4601  data: 0.0002  max mem: 24440
[03:37:32.457499] Epoch: [17]  [5003/5004]  eta: 0:00:00  lr: 0.001187  loss: 3.4051 (3.3661)  time: 0.4542  data: 0.0007  max mem: 24440
[03:37:33.061741] Epoch: [17] Total time: 0:38:31 (0.4620 s / it)
[03:37:33.064551] Averaged stats: lr: 0.001187  loss: 3.4051 (3.3664)
[03:37:34.860992] Test:  [   0/1563]  eta: 0:46:39  loss: 0.6592 (0.6592)  acc1: 81.2500 (81.2500)  acc5: 93.7500 (93.7500)  time: 1.7911  data: 1.5584  max mem: 24440
[03:38:58.879923] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.4118 (1.3247)  acc1: 62.5000 (66.2300)  acc5: 87.5000 (90.2071)  time: 0.1678  data: 0.0002  max mem: 24440
[03:40:22.861281] Test:  [1000/1563]  eta: 0:01:35  loss: 1.8600 (1.4933)  acc1: 53.1250 (63.7331)  acc5: 84.3750 (87.3283)  time: 0.1678  data: 0.0002  max mem: 24440
[03:41:46.847308] Test:  [1500/1563]  eta: 0:00:10  loss: 1.0519 (1.6066)  acc1: 78.1250 (61.6506)  acc5: 90.6250 (85.4097)  time: 0.1682  data: 0.0002  max mem: 24440
[03:41:57.168800] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7758 (1.6018)  acc1: 81.2500 (61.7500)  acc5: 93.7500 (85.5060)  time: 0.1636  data: 0.0001  max mem: 24440
[03:41:57.287194] Test: Total time: 0:04:24 (0.1690 s / it)
[03:41:57.751477] * Acc@1 61.752 Acc@5 85.505 loss 1.602
[03:41:57.751623] Accuracy of the network on the 50000 test images: 61.8%
[03:41:57.751646] Max accuracy: 62.12%
[03:41:57.825593] log_dir: ./output_dir_qkformer
[03:42:00.994188] Epoch: [18]  [   0/5004]  eta: 4:24:12  lr: 0.001187  loss: 3.7603 (3.7603)  time: 3.1679  data: 2.1398  max mem: 24440
[03:57:22.481002] Epoch: [18]  [2000/5004]  eta: 0:23:08  lr: 0.001186  loss: 3.1415 (3.3399)  time: 0.4618  data: 0.0002  max mem: 24440
[04:12:42.981741] Epoch: [18]  [4000/5004]  eta: 0:07:43  lr: 0.001185  loss: 3.4627 (3.3382)  time: 0.4587  data: 0.0002  max mem: 24440
[04:20:24.519244] Epoch: [18]  [5003/5004]  eta: 0:00:00  lr: 0.001185  loss: 3.1634 (3.3402)  time: 0.4561  data: 0.0008  max mem: 24440
[04:20:24.869658] Epoch: [18] Total time: 0:38:27 (0.4610 s / it)
[04:20:24.872648] Averaged stats: lr: 0.001185  loss: 3.1634 (3.3462)
[04:20:26.459015] Test:  [   0/1563]  eta: 0:41:11  loss: 0.6956 (0.6956)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.5816  data: 1.3461  max mem: 24440
[04:21:50.442736] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.5074 (1.2265)  acc1: 59.3750 (68.8748)  acc5: 90.6250 (91.7290)  time: 0.1681  data: 0.0002  max mem: 24440
[04:23:14.367345] Test:  [1000/1563]  eta: 0:01:35  loss: 1.6747 (1.4326)  acc1: 53.1250 (65.3784)  acc5: 81.2500 (88.2399)  time: 0.1677  data: 0.0002  max mem: 24440
[04:24:38.671332] Test:  [1500/1563]  eta: 0:00:10  loss: 1.1715 (1.5555)  acc1: 75.0000 (63.0309)  acc5: 90.6250 (86.3633)  time: 0.1698  data: 0.0007  max mem: 24440
[04:24:48.990357] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7535 (1.5509)  acc1: 81.2500 (63.1260)  acc5: 90.6250 (86.4340)  time: 0.1635  data: 0.0001  max mem: 24440
[04:24:49.083629] Test: Total time: 0:04:24 (0.1690 s / it)
[04:24:49.841896] * Acc@1 63.124 Acc@5 86.433 loss 1.551
[04:24:49.842052] Accuracy of the network on the 50000 test images: 63.1%
[04:24:49.842074] Max accuracy: 63.12%
[04:24:50.035919] log_dir: ./output_dir_qkformer
[04:24:53.145622] Epoch: [19]  [   0/5004]  eta: 4:19:12  lr: 0.001185  loss: 2.9937 (2.9937)  time: 3.1081  data: 2.5070  max mem: 24440
[04:40:14.971166] Epoch: [19]  [2000/5004]  eta: 0:23:08  lr: 0.001184  loss: 3.4351 (3.3264)  time: 0.4558  data: 0.0003  max mem: 24440
[04:55:35.694952] Epoch: [19]  [4000/5004]  eta: 0:07:43  lr: 0.001183  loss: 3.2318 (3.3269)  time: 0.4605  data: 0.0002  max mem: 24440
[05:03:17.811310] Epoch: [19]  [5003/5004]  eta: 0:00:00  lr: 0.001183  loss: 3.1710 (3.3251)  time: 0.4539  data: 0.0005  max mem: 24440
[05:03:18.216851] Epoch: [19] Total time: 0:38:28 (0.4613 s / it)
[05:03:18.219618] Averaged stats: lr: 0.001183  loss: 3.1710 (3.3274)
[05:03:19.904576] Test:  [   0/1563]  eta: 0:43:44  loss: 1.0296 (1.0296)  acc1: 87.5000 (87.5000)  acc5: 90.6250 (90.6250)  time: 1.6791  data: 1.4992  max mem: 24440
[05:04:43.851698] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.3138 (1.2264)  acc1: 65.6250 (68.8124)  acc5: 90.6250 (91.4858)  time: 0.1677  data: 0.0002  max mem: 24440
[05:06:07.790104] Test:  [1000/1563]  eta: 0:01:35  loss: 1.9620 (1.4084)  acc1: 53.1250 (65.6000)  acc5: 87.5000 (88.5146)  time: 0.1683  data: 0.0002  max mem: 24440
[05:07:31.704867] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9652 (1.5282)  acc1: 78.1250 (63.3411)  acc5: 93.7500 (86.6506)  time: 0.1679  data: 0.0002  max mem: 24440
[05:07:42.021964] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6635 (1.5285)  acc1: 84.3750 (63.3720)  acc5: 96.8750 (86.6320)  time: 0.1635  data: 0.0001  max mem: 24440
[05:07:42.128466] Test: Total time: 0:04:23 (0.1688 s / it)
[05:07:42.690545] * Acc@1 63.370 Acc@5 86.632 loss 1.528
[05:07:42.690705] Accuracy of the network on the 50000 test images: 63.4%
[05:07:42.690727] Max accuracy: 63.37%
[05:07:42.788949] log_dir: ./output_dir_qkformer
[05:07:45.575600] Epoch: [20]  [   0/5004]  eta: 3:52:20  lr: 0.001183  loss: 3.5370 (3.5370)  time: 2.7859  data: 2.2654  max mem: 24440
[05:23:06.367855] Epoch: [20]  [2000/5004]  eta: 0:23:06  lr: 0.001182  loss: 3.2095 (3.3096)  time: 0.4668  data: 0.0002  max mem: 24440
[05:38:36.682743] Epoch: [20]  [4000/5004]  eta: 0:07:45  lr: 0.001181  loss: 3.1409 (3.3126)  time: 0.4581  data: 0.0002  max mem: 24440
[05:46:18.289068] Epoch: [20]  [5003/5004]  eta: 0:00:00  lr: 0.001180  loss: 3.2569 (3.3146)  time: 0.4537  data: 0.0006  max mem: 24440
[05:46:18.730692] Epoch: [20] Total time: 0:38:35 (0.4628 s / it)
[05:46:18.739541] Averaged stats: lr: 0.001180  loss: 3.2569 (3.3126)
[05:46:21.183814] Test:  [   0/1563]  eta: 1:03:19  loss: 0.7269 (0.7269)  acc1: 81.2500 (81.2500)  acc5: 93.7500 (93.7500)  time: 2.4308  data: 2.0196  max mem: 24440
[05:47:45.212869] Test:  [ 500/1563]  eta: 0:03:03  loss: 1.4685 (1.1838)  acc1: 56.2500 (69.3301)  acc5: 87.5000 (91.8164)  time: 0.1680  data: 0.0002  max mem: 24440
[05:49:09.205079] Test:  [1000/1563]  eta: 0:01:35  loss: 1.7849 (1.3695)  acc1: 53.1250 (66.1932)  acc5: 87.5000 (88.7238)  time: 0.1679  data: 0.0002  max mem: 24440
[05:50:33.307867] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9165 (1.4851)  acc1: 78.1250 (63.9261)  acc5: 93.7500 (86.8692)  time: 0.1681  data: 0.0002  max mem: 24440
[05:50:43.634398] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8147 (1.4863)  acc1: 81.2500 (63.9280)  acc5: 93.7500 (86.8580)  time: 0.1635  data: 0.0001  max mem: 24440
[05:50:43.761849] Test: Total time: 0:04:25 (0.1696 s / it)
[05:50:44.353800] * Acc@1 63.926 Acc@5 86.856 loss 1.486
[05:50:44.353950] Accuracy of the network on the 50000 test images: 63.9%
[05:50:44.353972] Max accuracy: 63.93%
[05:50:44.627330] log_dir: ./output_dir_qkformer
[05:50:47.336759] Epoch: [21]  [   0/5004]  eta: 3:45:48  lr: 0.001180  loss: 2.8302 (2.8302)  time: 2.7075  data: 2.2181  max mem: 24440
[06:06:09.409154] Epoch: [21]  [2000/5004]  eta: 0:23:07  lr: 0.001179  loss: 3.2895 (3.2849)  time: 0.4626  data: 0.0002  max mem: 24440
[06:21:29.952677] Epoch: [21]  [4000/5004]  eta: 0:07:42  lr: 0.001178  loss: 3.1911 (3.2929)  time: 0.4595  data: 0.0002  max mem: 24440
[06:29:11.389292] Epoch: [21]  [5003/5004]  eta: 0:00:00  lr: 0.001178  loss: 3.2365 (3.2972)  time: 0.4544  data: 0.0010  max mem: 24440
[06:29:11.949108] Epoch: [21] Total time: 0:38:27 (0.4611 s / it)
[06:29:11.967786] Averaged stats: lr: 0.001178  loss: 3.2365 (3.2968)
[06:29:14.708327] Test:  [   0/1563]  eta: 1:11:13  loss: 0.8367 (0.8367)  acc1: 84.3750 (84.3750)  acc5: 93.7500 (93.7500)  time: 2.7345  data: 2.5235  max mem: 24440
[06:30:38.703339] Test:  [ 500/1563]  eta: 0:03:03  loss: 1.1375 (1.1653)  acc1: 75.0000 (70.4404)  acc5: 90.6250 (92.3590)  time: 0.1679  data: 0.0002  max mem: 24440
[06:32:02.703740] Test:  [1000/1563]  eta: 0:01:36  loss: 1.8586 (1.3607)  acc1: 46.8750 (66.7020)  acc5: 87.5000 (89.1327)  time: 0.1679  data: 0.0002  max mem: 24440
[06:33:26.730915] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8996 (1.4862)  acc1: 81.2500 (64.3550)  acc5: 93.7500 (87.0690)  time: 0.1681  data: 0.0002  max mem: 24440
[06:33:37.051308] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7074 (1.4868)  acc1: 81.2500 (64.3380)  acc5: 96.8750 (87.0440)  time: 0.1635  data: 0.0001  max mem: 24440
[06:33:37.248108] Test: Total time: 0:04:25 (0.1697 s / it)
[06:33:37.551270] * Acc@1 64.340 Acc@5 87.044 loss 1.487
[06:33:37.551549] Accuracy of the network on the 50000 test images: 64.3%
[06:33:37.551598] Max accuracy: 64.34%
[06:33:37.793225] log_dir: ./output_dir_qkformer
[06:33:42.822223] Epoch: [22]  [   0/5004]  eta: 6:59:18  lr: 0.001178  loss: 3.4584 (3.4584)  time: 5.0278  data: 2.7452  max mem: 24440
[06:49:03.762739] Epoch: [22]  [2000/5004]  eta: 0:23:10  lr: 0.001177  loss: 3.3893 (3.2824)  time: 0.4582  data: 0.0002  max mem: 24440
[07:04:23.020229] Epoch: [22]  [4000/5004]  eta: 0:07:43  lr: 0.001176  loss: 3.3623 (3.2855)  time: 0.4602  data: 0.0002  max mem: 24440
[07:12:04.315859] Epoch: [22]  [5003/5004]  eta: 0:00:00  lr: 0.001175  loss: 3.3558 (3.2854)  time: 0.4559  data: 0.0005  max mem: 24440
[07:12:05.113941] Epoch: [22] Total time: 0:38:27 (0.4611 s / it)
[07:12:05.183388] Averaged stats: lr: 0.001175  loss: 3.3558 (3.2829)
[07:12:08.095774] Test:  [   0/1563]  eta: 1:15:42  loss: 0.4523 (0.4523)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.9060  data: 2.6540  max mem: 24440
[07:13:32.082867] Test:  [ 500/1563]  eta: 0:03:04  loss: 1.0644 (1.2503)  acc1: 65.6250 (69.0307)  acc5: 93.7500 (91.3797)  time: 0.1678  data: 0.0002  max mem: 24440
[07:14:56.069382] Test:  [1000/1563]  eta: 0:01:36  loss: 1.4187 (1.4311)  acc1: 56.2500 (65.5220)  acc5: 90.6250 (88.4553)  time: 0.1679  data: 0.0002  max mem: 24440
[07:16:20.057184] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9513 (1.5537)  acc1: 75.0000 (63.2537)  acc5: 93.7500 (86.4424)  time: 0.1679  data: 0.0002  max mem: 24440
[07:16:30.385464] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8402 (1.5551)  acc1: 84.3750 (63.2840)  acc5: 93.7500 (86.4120)  time: 0.1635  data: 0.0001  max mem: 24440
[07:16:30.523046] Test: Total time: 0:04:25 (0.1698 s / it)
[07:16:31.158802] * Acc@1 63.284 Acc@5 86.408 loss 1.555
[07:16:31.158971] Accuracy of the network on the 50000 test images: 63.3%
[07:16:31.158997] Max accuracy: 64.34%
[07:16:31.293364] log_dir: ./output_dir_qkformer
[07:16:34.072991] Epoch: [23]  [   0/5004]  eta: 3:51:41  lr: 0.001175  loss: 2.9108 (2.9108)  time: 2.7780  data: 2.2325  max mem: 24440
[07:31:54.579567] Epoch: [23]  [2000/5004]  eta: 0:23:05  lr: 0.001174  loss: 3.2267 (3.2636)  time: 0.4645  data: 0.0002  max mem: 24440
[07:47:15.317175] Epoch: [23]  [4000/5004]  eta: 0:07:42  lr: 0.001173  loss: 3.3521 (3.2616)  time: 0.4597  data: 0.0002  max mem: 24440
[07:54:56.153062] Epoch: [23]  [5003/5004]  eta: 0:00:00  lr: 0.001172  loss: 3.2203 (3.2625)  time: 0.4579  data: 0.0009  max mem: 24440
[07:54:56.627951] Epoch: [23] Total time: 0:38:25 (0.4607 s / it)
[07:54:56.631909] Averaged stats: lr: 0.001172  loss: 3.2203 (3.2678)
[07:54:58.100999] Test:  [   0/1563]  eta: 0:38:09  loss: 1.6648 (1.6648)  acc1: 59.3750 (59.3750)  acc5: 87.5000 (87.5000)  time: 1.4647  data: 1.2884  max mem: 24440
[07:56:22.198428] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.2881 (1.1401)  acc1: 68.7500 (70.2283)  acc5: 93.7500 (92.7832)  time: 0.1678  data: 0.0002  max mem: 24440
[07:57:46.138846] Test:  [1000/1563]  eta: 0:01:35  loss: 1.6544 (1.3228)  acc1: 56.2500 (67.2421)  acc5: 87.5000 (89.5854)  time: 0.1681  data: 0.0002  max mem: 24440
[07:59:10.082640] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7731 (1.4420)  acc1: 81.2500 (65.0441)  acc5: 96.8750 (87.6770)  time: 0.1680  data: 0.0002  max mem: 24440
[07:59:20.410206] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5771 (1.4487)  acc1: 78.1250 (64.9280)  acc5: 93.7500 (87.5720)  time: 0.1634  data: 0.0001  max mem: 24440
[07:59:20.529995] Test: Total time: 0:04:23 (0.1688 s / it)
[07:59:21.170238] * Acc@1 64.933 Acc@5 87.573 loss 1.449
[07:59:21.170472] Accuracy of the network on the 50000 test images: 64.9%
[07:59:21.170507] Max accuracy: 64.93%
[07:59:21.238087] log_dir: ./output_dir_qkformer
[07:59:24.841268] Epoch: [24]  [   0/5004]  eta: 4:59:12  lr: 0.001172  loss: 2.7947 (2.7947)  time: 3.5876  data: 2.3360  max mem: 24440
[08:14:46.989316] Epoch: [24]  [2000/5004]  eta: 0:23:09  lr: 0.001171  loss: 3.2059 (3.2562)  time: 0.4612  data: 0.0003  max mem: 24440
[08:30:09.453306] Epoch: [24]  [4000/5004]  eta: 0:07:43  lr: 0.001170  loss: 3.2264 (3.2545)  time: 0.4624  data: 0.0003  max mem: 24440
[08:37:50.422700] Epoch: [24]  [5003/5004]  eta: 0:00:00  lr: 0.001169  loss: 3.2001 (3.2545)  time: 0.4533  data: 0.0006  max mem: 24440
[08:37:50.851459] Epoch: [24] Total time: 0:38:29 (0.4616 s / it)
[08:37:50.881451] Averaged stats: lr: 0.001169  loss: 3.2001 (3.2570)
[08:37:52.710272] Test:  [   0/1563]  eta: 0:47:29  loss: 0.4714 (0.4714)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8228  data: 1.6488  max mem: 24440
[08:39:16.711571] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9822 (1.2394)  acc1: 68.7500 (69.6794)  acc5: 90.6250 (91.3486)  time: 0.1679  data: 0.0002  max mem: 24440
[08:40:40.761705] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5392 (1.3835)  acc1: 46.8750 (66.7832)  acc5: 84.3750 (88.9517)  time: 0.1678  data: 0.0002  max mem: 24440
[08:42:04.731358] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6585 (1.4899)  acc1: 81.2500 (64.6798)  acc5: 93.7500 (87.2314)  time: 0.1678  data: 0.0002  max mem: 24440
[08:42:15.054852] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5934 (1.4861)  acc1: 90.6250 (64.7760)  acc5: 93.7500 (87.3040)  time: 0.1635  data: 0.0001  max mem: 24440
[08:42:15.168156] Test: Total time: 0:04:24 (0.1691 s / it)
[08:42:15.548023] * Acc@1 64.775 Acc@5 87.305 loss 1.486
[08:42:15.548226] Accuracy of the network on the 50000 test images: 64.8%
[08:42:15.548266] Max accuracy: 64.93%
[08:42:15.649718] log_dir: ./output_dir_qkformer
[08:42:18.263708] Epoch: [25]  [   0/5004]  eta: 3:37:54  lr: 0.001169  loss: 3.4758 (3.4758)  time: 2.6129  data: 2.1378  max mem: 24440
[08:57:38.740435] Epoch: [25]  [2000/5004]  eta: 0:23:05  lr: 0.001168  loss: 3.1827 (3.2438)  time: 0.4603  data: 0.0002  max mem: 24440
[09:13:01.585971] Epoch: [25]  [4000/5004]  eta: 0:07:43  lr: 0.001167  loss: 3.3251 (3.2456)  time: 0.4560  data: 0.0002  max mem: 24440
[09:20:42.667880] Epoch: [25]  [5003/5004]  eta: 0:00:00  lr: 0.001166  loss: 3.3424 (3.2429)  time: 0.4529  data: 0.0009  max mem: 24440
[09:20:43.090848] Epoch: [25] Total time: 0:38:27 (0.4611 s / it)
[09:20:43.101213] Averaged stats: lr: 0.001166  loss: 3.3424 (3.2422)
[09:20:44.739371] Test:  [   0/1563]  eta: 0:42:34  loss: 0.6529 (0.6529)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.6342  data: 1.4411  max mem: 24440
[09:22:08.697574] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.2152 (1.1919)  acc1: 65.6250 (69.7979)  acc5: 90.6250 (92.1158)  time: 0.1677  data: 0.0002  max mem: 24440
[09:23:32.628346] Test:  [1000/1563]  eta: 0:01:35  loss: 1.7800 (1.3597)  acc1: 53.1250 (67.0017)  acc5: 84.3750 (89.2170)  time: 0.1678  data: 0.0002  max mem: 24440
[09:24:56.572585] Test:  [1500/1563]  eta: 0:00:10  loss: 1.1874 (1.4761)  acc1: 71.8750 (64.6527)  acc5: 90.6250 (87.4021)  time: 0.1677  data: 0.0002  max mem: 24440
[09:25:06.903313] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8611 (1.4784)  acc1: 84.3750 (64.6680)  acc5: 96.8750 (87.3860)  time: 0.1635  data: 0.0001  max mem: 24440
[09:25:07.007704] Test: Total time: 0:04:23 (0.1688 s / it)
[09:25:07.536871] * Acc@1 64.669 Acc@5 87.388 loss 1.478
[09:25:07.537047] Accuracy of the network on the 50000 test images: 64.7%
[09:25:07.537070] Max accuracy: 64.93%
[09:25:07.612646] log_dir: ./output_dir_qkformer
[09:25:10.316890] Epoch: [26]  [   0/5004]  eta: 3:45:28  lr: 0.001166  loss: 3.1657 (3.1657)  time: 2.7036  data: 2.2047  max mem: 24440
[09:40:30.902987] Epoch: [26]  [2000/5004]  eta: 0:23:06  lr: 0.001165  loss: 3.1294 (3.2198)  time: 0.4583  data: 0.0003  max mem: 24440
[09:55:53.248384] Epoch: [26]  [4000/5004]  eta: 0:07:43  lr: 0.001163  loss: 3.2618 (3.2292)  time: 0.4578  data: 0.0002  max mem: 24440
[10:03:35.076273] Epoch: [26]  [5003/5004]  eta: 0:00:00  lr: 0.001163  loss: 3.1468 (3.2296)  time: 0.4541  data: 0.0010  max mem: 24440
[10:03:35.629665] Epoch: [26] Total time: 0:38:28 (0.4612 s / it)
[10:03:35.630978] Averaged stats: lr: 0.001163  loss: 3.1468 (3.2325)
[10:03:38.668715] Test:  [   0/1563]  eta: 1:18:56  loss: 0.2982 (0.2982)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 3.0306  data: 2.3871  max mem: 24440
[10:05:02.617641] Test:  [ 500/1563]  eta: 0:03:04  loss: 1.2055 (1.1587)  acc1: 65.6250 (70.3842)  acc5: 93.7500 (92.2343)  time: 0.1680  data: 0.0002  max mem: 24440
[10:06:26.544167] Test:  [1000/1563]  eta: 0:01:36  loss: 1.6967 (1.3438)  acc1: 50.0000 (67.1735)  acc5: 90.6250 (89.2576)  time: 0.1677  data: 0.0002  max mem: 24440
[10:07:50.519422] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7869 (1.4424)  acc1: 81.2500 (65.3106)  acc5: 93.7500 (87.7249)  time: 0.1678  data: 0.0002  max mem: 24440
[10:08:00.840379] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8875 (1.4433)  acc1: 78.1250 (65.2840)  acc5: 96.8750 (87.7180)  time: 0.1635  data: 0.0001  max mem: 24440
[10:08:00.958468] Test: Total time: 0:04:25 (0.1698 s / it)
[10:08:01.694730] * Acc@1 65.283 Acc@5 87.716 loss 1.443
[10:08:01.694883] Accuracy of the network on the 50000 test images: 65.3%
[10:08:01.694905] Max accuracy: 65.28%
[10:08:01.796396] log_dir: ./output_dir_qkformer
[10:08:04.227909] Epoch: [27]  [   0/5004]  eta: 3:22:23  lr: 0.001163  loss: 3.9852 (3.9852)  time: 2.4267  data: 1.9585  max mem: 24440
[10:23:25.672380] Epoch: [27]  [2000/5004]  eta: 0:23:06  lr: 0.001161  loss: 3.3061 (3.2161)  time: 0.4650  data: 0.0002  max mem: 24440
[10:38:45.853536] Epoch: [27]  [4000/5004]  eta: 0:07:42  lr: 0.001160  loss: 3.1362 (3.2168)  time: 0.4607  data: 0.0003  max mem: 24440
[10:46:27.419558] Epoch: [27]  [5003/5004]  eta: 0:00:00  lr: 0.001159  loss: 3.2849 (3.2177)  time: 0.4540  data: 0.0005  max mem: 24440
[10:46:27.957242] Epoch: [27] Total time: 0:38:26 (0.4609 s / it)
[10:46:27.969868] Averaged stats: lr: 0.001159  loss: 3.2849 (3.2219)
[10:46:30.685542] Test:  [   0/1563]  eta: 1:10:34  loss: 0.4096 (0.4096)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.7094  data: 2.2875  max mem: 24440
[10:47:54.712659] Test:  [ 500/1563]  eta: 0:03:04  loss: 1.2455 (1.1972)  acc1: 62.5000 (70.3593)  acc5: 93.7500 (92.4838)  time: 0.1684  data: 0.0002  max mem: 24440
[10:49:18.736908] Test:  [1000/1563]  eta: 0:01:36  loss: 1.6276 (1.3536)  acc1: 59.3750 (67.7010)  acc5: 87.5000 (89.7915)  time: 0.1679  data: 0.0002  max mem: 24440
[10:50:42.785012] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7966 (1.4556)  acc1: 81.2500 (65.6104)  acc5: 93.7500 (88.1600)  time: 0.1678  data: 0.0002  max mem: 24440
[10:50:53.113238] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7406 (1.4567)  acc1: 81.2500 (65.5900)  acc5: 93.7500 (88.1280)  time: 0.1635  data: 0.0001  max mem: 24440
[10:50:53.236775] Test: Total time: 0:04:25 (0.1697 s / it)
[10:50:53.494106] * Acc@1 65.592 Acc@5 88.126 loss 1.457
[10:50:53.494275] Accuracy of the network on the 50000 test images: 65.6%
[10:50:53.494301] Max accuracy: 65.59%
[10:50:53.597283] log_dir: ./output_dir_qkformer
[10:50:56.320633] Epoch: [28]  [   0/5004]  eta: 3:47:00  lr: 0.001159  loss: 3.3481 (3.3481)  time: 2.7218  data: 2.1682  max mem: 24440
[11:06:14.925676] Epoch: [28]  [2000/5004]  eta: 0:23:03  lr: 0.001158  loss: 3.1315 (3.2018)  time: 0.4606  data: 0.0002  max mem: 24440
[11:21:33.968660] Epoch: [28]  [4000/5004]  eta: 0:07:41  lr: 0.001156  loss: 3.1897 (3.2034)  time: 0.4587  data: 0.0002  max mem: 24440
[11:29:14.530267] Epoch: [28]  [5003/5004]  eta: 0:00:00  lr: 0.001156  loss: 3.4181 (3.2048)  time: 0.4531  data: 0.0007  max mem: 24440
[11:29:14.972628] Epoch: [28] Total time: 0:38:21 (0.4599 s / it)
[11:29:14.973738] Averaged stats: lr: 0.001156  loss: 3.4181 (3.2086)
[11:29:16.936363] Test:  [   0/1563]  eta: 0:51:02  loss: 0.7346 (0.7346)  acc1: 84.3750 (84.3750)  acc5: 90.6250 (90.6250)  time: 1.9594  data: 1.7848  max mem: 24440
[11:30:40.965693] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.2700 (1.1325)  acc1: 68.7500 (71.0267)  acc5: 90.6250 (92.3590)  time: 0.1682  data: 0.0002  max mem: 24440
[11:32:04.983927] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2957 (1.2954)  acc1: 65.6250 (68.0882)  acc5: 93.7500 (89.8227)  time: 0.1680  data: 0.0002  max mem: 24440
[11:33:28.993138] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7741 (1.3816)  acc1: 81.2500 (66.4953)  acc5: 93.7500 (88.4723)  time: 0.1679  data: 0.0002  max mem: 24440
[11:33:39.320594] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7333 (1.3801)  acc1: 87.5000 (66.5560)  acc5: 93.7500 (88.4940)  time: 0.1636  data: 0.0001  max mem: 24440
[11:33:39.447979] Test: Total time: 0:04:24 (0.1692 s / it)
[11:33:39.874304] * Acc@1 66.555 Acc@5 88.495 loss 1.380
[11:33:39.874468] Accuracy of the network on the 50000 test images: 66.6%
[11:33:39.874491] Max accuracy: 66.55%
[11:33:39.943022] log_dir: ./output_dir_qkformer
[11:33:42.553567] Epoch: [29]  [   0/5004]  eta: 3:37:38  lr: 0.001156  loss: 3.4099 (3.4099)  time: 2.6096  data: 2.1134  max mem: 24440
[11:49:02.902146] Epoch: [29]  [2000/5004]  eta: 0:23:05  lr: 0.001154  loss: 3.0570 (3.1863)  time: 0.4571  data: 0.0002  max mem: 24440
[12:04:29.476559] Epoch: [29]  [4000/5004]  eta: 0:07:44  lr: 0.001153  loss: 3.2781 (3.1948)  time: 0.4613  data: 0.0002  max mem: 24440
[12:12:11.018382] Epoch: [29]  [5003/5004]  eta: 0:00:00  lr: 0.001152  loss: 3.2333 (3.1998)  time: 0.4537  data: 0.0010  max mem: 24440
[12:12:11.512059] Epoch: [29] Total time: 0:38:31 (0.4619 s / it)
[12:12:11.528938] Averaged stats: lr: 0.001152  loss: 3.2333 (3.2016)
[12:12:13.942276] Test:  [   0/1563]  eta: 1:02:42  loss: 0.3471 (0.3471)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.4070  data: 2.0551  max mem: 24440
[12:13:37.922543] Test:  [ 500/1563]  eta: 0:03:03  loss: 1.0985 (1.1268)  acc1: 71.8750 (71.7814)  acc5: 93.7500 (92.6522)  time: 0.1680  data: 0.0002  max mem: 24440
[12:15:01.898906] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5251 (1.3272)  acc1: 62.5000 (67.8041)  acc5: 90.6250 (89.5511)  time: 0.1679  data: 0.0002  max mem: 24440
[12:16:25.853909] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8701 (1.4383)  acc1: 84.3750 (65.6708)  acc5: 93.7500 (87.8581)  time: 0.1680  data: 0.0002  max mem: 24440
[12:16:36.187833] Test:  [1562/1563]  eta: 0:00:00  loss: 0.9457 (1.4353)  acc1: 78.1250 (65.7360)  acc5: 93.7500 (87.9120)  time: 0.1635  data: 0.0001  max mem: 24440
[12:16:36.300003] Test: Total time: 0:04:24 (0.1694 s / it)
[12:16:36.833947] * Acc@1 65.740 Acc@5 87.913 loss 1.435
[12:16:36.834096] Accuracy of the network on the 50000 test images: 65.7%
[12:16:36.834118] Max accuracy: 66.55%
[12:16:36.928546] log_dir: ./output_dir_qkformer
[12:16:39.689686] Epoch: [30]  [   0/5004]  eta: 3:50:12  lr: 0.001152  loss: 2.6897 (2.6897)  time: 2.7603  data: 2.0852  max mem: 24440
[12:31:58.860389] Epoch: [30]  [2000/5004]  eta: 0:23:04  lr: 0.001151  loss: 3.2629 (3.1779)  time: 0.4590  data: 0.0003  max mem: 24440
[12:47:17.613051] Epoch: [30]  [4000/5004]  eta: 0:07:41  lr: 0.001149  loss: 3.1807 (3.1909)  time: 0.4615  data: 0.0003  max mem: 24440
[12:54:58.391440] Epoch: [30]  [5003/5004]  eta: 0:00:00  lr: 0.001148  loss: 3.1339 (3.1914)  time: 0.4539  data: 0.0009  max mem: 24440
[12:54:58.765605] Epoch: [30] Total time: 0:38:21 (0.4600 s / it)
[12:54:58.770117] Averaged stats: lr: 0.001148  loss: 3.1339 (3.1910)
[12:55:00.830317] Test:  [   0/1563]  eta: 0:53:34  loss: 0.6626 (0.6626)  acc1: 87.5000 (87.5000)  acc5: 93.7500 (93.7500)  time: 2.0564  data: 1.8714  max mem: 24440
[12:56:24.834124] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0463 (1.1472)  acc1: 68.7500 (71.1951)  acc5: 93.7500 (92.5836)  time: 0.1681  data: 0.0002  max mem: 24440
[12:57:48.849211] Test:  [1000/1563]  eta: 0:01:35  loss: 1.9554 (1.3095)  acc1: 46.8750 (67.9758)  acc5: 81.2500 (89.8726)  time: 0.1680  data: 0.0002  max mem: 24440
[12:59:12.847048] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7956 (1.3966)  acc1: 81.2500 (66.3391)  acc5: 96.8750 (88.4598)  time: 0.1681  data: 0.0002  max mem: 24440
[12:59:23.174336] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7317 (1.3976)  acc1: 84.3750 (66.3580)  acc5: 93.7500 (88.4360)  time: 0.1636  data: 0.0001  max mem: 24440
[12:59:23.296853] Test: Total time: 0:04:24 (0.1692 s / it)
[12:59:23.775448] * Acc@1 66.358 Acc@5 88.435 loss 1.398
[12:59:23.775602] Accuracy of the network on the 50000 test images: 66.4%
[12:59:23.775625] Max accuracy: 66.55%
[12:59:23.883642] log_dir: ./output_dir_qkformer
[12:59:26.725758] Epoch: [31]  [   0/5004]  eta: 3:56:50  lr: 0.001148  loss: 3.3611 (3.3611)  time: 2.8399  data: 2.3357  max mem: 24440
[13:14:46.822863] Epoch: [31]  [2000/5004]  eta: 0:23:05  lr: 0.001147  loss: 3.2999 (3.1869)  time: 0.4646  data: 0.0002  max mem: 24440
[13:30:07.733905] Epoch: [31]  [4000/5004]  eta: 0:07:42  lr: 0.001145  loss: 3.1510 (3.1935)  time: 0.4623  data: 0.0002  max mem: 24440
[13:37:48.644480] Epoch: [31]  [5003/5004]  eta: 0:00:00  lr: 0.001144  loss: 3.1278 (3.1904)  time: 0.4537  data: 0.0005  max mem: 24440
[13:37:49.084863] Epoch: [31] Total time: 0:38:25 (0.4607 s / it)
[13:37:49.100716] Averaged stats: lr: 0.001144  loss: 3.1278 (3.1836)
[13:37:50.565161] Test:  [   0/1563]  eta: 0:37:54  loss: 0.4279 (0.4279)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.4552  data: 1.2807  max mem: 24440
[13:39:14.784567] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0775 (1.1284)  acc1: 75.0000 (71.5195)  acc5: 93.7500 (92.5337)  time: 0.1679  data: 0.0002  max mem: 24440
[13:40:38.798975] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3636 (1.2740)  acc1: 62.5000 (68.8280)  acc5: 90.6250 (90.3097)  time: 0.1678  data: 0.0002  max mem: 24440
[13:42:02.781811] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8654 (1.3938)  acc1: 81.2500 (66.4474)  acc5: 93.7500 (88.4639)  time: 0.1681  data: 0.0002  max mem: 24440
[13:42:13.111624] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4503 (1.3897)  acc1: 87.5000 (66.5220)  acc5: 100.0000 (88.5140)  time: 0.1635  data: 0.0001  max mem: 24440
[13:42:13.221854] Test: Total time: 0:04:24 (0.1690 s / it)
[13:42:13.734342] * Acc@1 66.529 Acc@5 88.513 loss 1.390
[13:42:13.734636] Accuracy of the network on the 50000 test images: 66.5%
[13:42:13.734694] Max accuracy: 66.55%
[13:42:13.863761] log_dir: ./output_dir_qkformer
[13:42:16.609950] Epoch: [32]  [   0/5004]  eta: 3:48:54  lr: 0.001144  loss: 2.7669 (2.7669)  time: 2.7446  data: 2.1575  max mem: 24440
[13:57:37.104345] Epoch: [32]  [2000/5004]  eta: 0:23:05  lr: 0.001143  loss: 3.3236 (3.1629)  time: 0.4568  data: 0.0002  max mem: 24440
[14:12:57.766404] Epoch: [32]  [4000/5004]  eta: 0:07:42  lr: 0.001141  loss: 3.2030 (3.1738)  time: 0.4592  data: 0.0002  max mem: 24440
[14:20:38.908914] Epoch: [32]  [5003/5004]  eta: 0:00:00  lr: 0.001140  loss: 3.1600 (3.1750)  time: 0.4540  data: 0.0009  max mem: 24440
[14:20:39.336240] Epoch: [32] Total time: 0:38:25 (0.4607 s / it)
[14:20:39.388319] Averaged stats: lr: 0.001140  loss: 3.1600 (3.1765)
[14:20:41.528554] Test:  [   0/1563]  eta: 0:55:38  loss: 0.3013 (0.3013)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.1357  data: 1.8101  max mem: 24440
[14:22:05.502443] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.3910 (1.1457)  acc1: 68.7500 (71.4508)  acc5: 93.7500 (92.5337)  time: 0.1681  data: 0.0002  max mem: 24440
[14:23:29.497212] Test:  [1000/1563]  eta: 0:01:35  loss: 1.7322 (1.2965)  acc1: 53.1250 (68.8000)  acc5: 87.5000 (90.1224)  time: 0.1680  data: 0.0002  max mem: 24440
[14:24:53.465322] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8662 (1.3963)  acc1: 81.2500 (66.6014)  acc5: 93.7500 (88.5285)  time: 0.1679  data: 0.0002  max mem: 24440
[14:25:03.792232] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7380 (1.3943)  acc1: 87.5000 (66.6520)  acc5: 96.8750 (88.5620)  time: 0.1635  data: 0.0001  max mem: 24440
[14:25:03.911930] Test: Total time: 0:04:24 (0.1692 s / it)
[14:25:04.435606] * Acc@1 66.652 Acc@5 88.561 loss 1.394
[14:25:04.435767] Accuracy of the network on the 50000 test images: 66.7%
[14:25:04.435789] Max accuracy: 66.65%
[14:25:04.525489] log_dir: ./output_dir_qkformer
[14:25:07.069745] Epoch: [33]  [   0/5004]  eta: 3:32:06  lr: 0.001140  loss: 3.1638 (3.1638)  time: 2.5432  data: 2.0758  max mem: 24440
[14:40:27.746880] Epoch: [33]  [2000/5004]  eta: 0:23:05  lr: 0.001138  loss: 3.1723 (3.1509)  time: 0.4601  data: 0.0002  max mem: 24440
[14:55:46.679694] Epoch: [33]  [4000/5004]  eta: 0:07:42  lr: 0.001137  loss: 3.1786 (3.1553)  time: 0.4603  data: 0.0002  max mem: 24440
[15:03:27.666535] Epoch: [33]  [5003/5004]  eta: 0:00:00  lr: 0.001136  loss: 3.2052 (3.1578)  time: 0.4532  data: 0.0008  max mem: 24440
[15:03:28.102289] Epoch: [33] Total time: 0:38:23 (0.4603 s / it)
[15:03:28.104279] Averaged stats: lr: 0.001136  loss: 3.2052 (3.1637)
[15:03:30.658432] Test:  [   0/1563]  eta: 1:06:22  loss: 0.5522 (0.5522)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.5481  data: 2.2497  max mem: 24440
[15:04:54.614400] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.9609 (1.1350)  acc1: 75.0000 (71.6629)  acc5: 93.7500 (92.6584)  time: 0.1678  data: 0.0002  max mem: 24440
[15:06:18.557246] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4421 (1.2872)  acc1: 62.5000 (68.4971)  acc5: 87.5000 (90.2067)  time: 0.1682  data: 0.0002  max mem: 24440
[15:07:42.512497] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8011 (1.3712)  acc1: 81.2500 (66.8492)  acc5: 93.7500 (88.8616)  time: 0.1677  data: 0.0002  max mem: 24440
[15:07:52.832806] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6655 (1.3661)  acc1: 84.3750 (66.9580)  acc5: 96.8750 (88.9360)  time: 0.1635  data: 0.0001  max mem: 24440
[15:07:52.942852] Test: Total time: 0:04:24 (0.1694 s / it)
[15:07:53.453029] * Acc@1 66.956 Acc@5 88.948 loss 1.366
[15:07:53.453219] Accuracy of the network on the 50000 test images: 67.0%
[15:07:53.453243] Max accuracy: 66.96%
[15:07:53.590711] log_dir: ./output_dir_qkformer
[15:07:56.387813] Epoch: [34]  [   0/5004]  eta: 3:52:58  lr: 0.001136  loss: 3.7913 (3.7913)  time: 2.7934  data: 2.2802  max mem: 24440
[15:23:16.437274] Epoch: [34]  [2000/5004]  eta: 0:23:05  lr: 0.001134  loss: 3.1243 (3.1547)  time: 0.4602  data: 0.0003  max mem: 24440
[15:38:35.544280] Epoch: [34]  [4000/5004]  eta: 0:07:42  lr: 0.001132  loss: 3.2484 (3.1506)  time: 0.4573  data: 0.0003  max mem: 24440
[15:46:16.001091] Epoch: [34]  [5003/5004]  eta: 0:00:00  lr: 0.001131  loss: 2.9967 (3.1511)  time: 0.4548  data: 0.0008  max mem: 24440
[15:46:16.740341] Epoch: [34] Total time: 0:38:23 (0.4603 s / it)
[15:46:16.770868] Averaged stats: lr: 0.001131  loss: 2.9967 (3.1585)
[15:46:18.948196] Test:  [   0/1563]  eta: 0:56:34  loss: 0.7258 (0.7258)  acc1: 84.3750 (84.3750)  acc5: 96.8750 (96.8750)  time: 2.1720  data: 1.8253  max mem: 24440
[15:47:42.930646] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.2609 (1.1266)  acc1: 65.6250 (71.0641)  acc5: 93.7500 (92.7769)  time: 0.1681  data: 0.0002  max mem: 24440
[15:49:06.856542] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5056 (1.2713)  acc1: 59.3750 (68.6751)  acc5: 90.6250 (90.5126)  time: 0.1678  data: 0.0002  max mem: 24440
[15:50:30.834934] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8848 (1.3849)  acc1: 81.2500 (66.4536)  acc5: 93.7500 (88.6950)  time: 0.1678  data: 0.0002  max mem: 24440
[15:50:41.164028] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6187 (1.3873)  acc1: 87.5000 (66.4780)  acc5: 93.7500 (88.6760)  time: 0.1634  data: 0.0001  max mem: 24440
[15:50:41.282949] Test: Total time: 0:04:24 (0.1692 s / it)
[15:50:41.860726] * Acc@1 66.478 Acc@5 88.678 loss 1.387
[15:50:41.860934] Accuracy of the network on the 50000 test images: 66.5%
[15:50:41.860984] Max accuracy: 66.96%
[15:50:41.941070] log_dir: ./output_dir_qkformer
[15:50:45.800757] Epoch: [35]  [   0/5004]  eta: 5:21:42  lr: 0.001131  loss: 3.4467 (3.4467)  time: 3.8574  data: 2.5230  max mem: 24440
[16:06:05.073473] Epoch: [35]  [2000/5004]  eta: 0:23:05  lr: 0.001130  loss: 3.1315 (3.1364)  time: 0.4571  data: 0.0002  max mem: 24440
[16:21:24.071989] Epoch: [35]  [4000/5004]  eta: 0:07:42  lr: 0.001128  loss: 3.1951 (3.1367)  time: 0.4567  data: 0.0002  max mem: 24440
[16:29:05.122305] Epoch: [35]  [5003/5004]  eta: 0:00:00  lr: 0.001127  loss: 3.0654 (3.1394)  time: 0.4539  data: 0.0009  max mem: 24440
[16:29:05.528238] Epoch: [35] Total time: 0:38:23 (0.4603 s / it)
[16:29:05.529104] Averaged stats: lr: 0.001127  loss: 3.0654 (3.1460)
[16:29:07.735696] Test:  [   0/1563]  eta: 0:57:22  loss: 0.5097 (0.5097)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.2027  data: 1.8328  max mem: 24440
[16:30:31.721836] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.1282 (1.1293)  acc1: 71.8750 (71.5132)  acc5: 93.7500 (92.9204)  time: 0.1679  data: 0.0002  max mem: 24440
[16:31:55.701631] Test:  [1000/1563]  eta: 0:01:35  loss: 1.9442 (1.2993)  acc1: 53.1250 (68.4316)  acc5: 84.3750 (90.0755)  time: 0.1682  data: 0.0002  max mem: 24440
[16:33:19.701455] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7128 (1.3872)  acc1: 81.2500 (66.8804)  acc5: 93.7500 (88.6825)  time: 0.1682  data: 0.0002  max mem: 24440
[16:33:30.032227] Test:  [1562/1563]  eta: 0:00:00  loss: 0.9354 (1.3897)  acc1: 78.1250 (66.7840)  acc5: 93.7500 (88.6520)  time: 0.1636  data: 0.0001  max mem: 24440
[16:33:30.173796] Test: Total time: 0:04:24 (0.1693 s / it)
[16:33:30.751285] * Acc@1 66.782 Acc@5 88.653 loss 1.390
[16:33:30.751445] Accuracy of the network on the 50000 test images: 66.8%
[16:33:30.751469] Max accuracy: 66.96%
[16:33:30.802915] log_dir: ./output_dir_qkformer
[16:33:33.359500] Epoch: [36]  [   0/5004]  eta: 3:33:08  lr: 0.001127  loss: 3.2224 (3.2224)  time: 2.5556  data: 2.0446  max mem: 24440
[16:48:53.764134] Epoch: [36]  [2000/5004]  eta: 0:23:05  lr: 0.001125  loss: 3.1638 (3.1351)  time: 0.4642  data: 0.0002  max mem: 24440
[17:04:13.519499] Epoch: [36]  [4000/5004]  eta: 0:07:42  lr: 0.001123  loss: 2.9989 (3.1396)  time: 0.4576  data: 0.0002  max mem: 24440
[17:11:54.257174] Epoch: [36]  [5003/5004]  eta: 0:00:00  lr: 0.001122  loss: 3.0045 (3.1435)  time: 0.4538  data: 0.0010  max mem: 24440
[17:11:54.915544] Epoch: [36] Total time: 0:38:24 (0.4605 s / it)
[17:11:54.920055] Averaged stats: lr: 0.001122  loss: 3.0045 (3.1409)
[17:11:56.731095] Test:  [   0/1563]  eta: 0:47:01  loss: 0.4183 (0.4183)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.8053  data: 1.6306  max mem: 24440
[17:13:20.668273] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0097 (1.0914)  acc1: 71.8750 (71.8563)  acc5: 93.7500 (93.1450)  time: 0.1677  data: 0.0002  max mem: 24440
[17:14:44.598257] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2679 (1.2291)  acc1: 62.5000 (69.2714)  acc5: 87.5000 (90.7811)  time: 0.1678  data: 0.0002  max mem: 24440
[17:16:09.703757] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6833 (1.3189)  acc1: 87.5000 (67.5758)  acc5: 96.8750 (89.3467)  time: 0.1677  data: 0.0002  max mem: 24440
[17:16:20.028213] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6279 (1.3199)  acc1: 87.5000 (67.5940)  acc5: 96.8750 (89.3180)  time: 0.1638  data: 0.0001  max mem: 24440
[17:16:20.138306] Test: Total time: 0:04:25 (0.1697 s / it)
[17:16:20.870497] * Acc@1 67.590 Acc@5 89.320 loss 1.320
[17:16:20.870650] Accuracy of the network on the 50000 test images: 67.6%
[17:16:20.870676] Max accuracy: 67.59%
[17:16:21.043380] log_dir: ./output_dir_qkformer
[17:16:23.789685] Epoch: [37]  [   0/5004]  eta: 3:48:56  lr: 0.001122  loss: 2.5416 (2.5416)  time: 2.7451  data: 2.1320  max mem: 24440
[17:31:44.163384] Epoch: [37]  [2000/5004]  eta: 0:23:05  lr: 0.001120  loss: 3.1475 (3.1339)  time: 0.4612  data: 0.0003  max mem: 24440
[17:47:04.660000] Epoch: [37]  [4000/5004]  eta: 0:07:42  lr: 0.001118  loss: 3.0811 (3.1364)  time: 0.4591  data: 0.0002  max mem: 24440
[17:54:46.017709] Epoch: [37]  [5003/5004]  eta: 0:00:00  lr: 0.001117  loss: 3.0955 (3.1386)  time: 0.4570  data: 0.0009  max mem: 24440
[17:54:46.426263] Epoch: [37] Total time: 0:38:25 (0.4607 s / it)
[17:54:46.434011] Averaged stats: lr: 0.001117  loss: 3.0955 (3.1345)
[17:54:47.974944] Test:  [   0/1563]  eta: 0:40:00  loss: 0.4813 (0.4813)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.5359  data: 1.3635  max mem: 24440
[17:56:12.032504] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7752 (1.0437)  acc1: 75.0000 (72.9603)  acc5: 96.8750 (93.7500)  time: 0.1679  data: 0.0002  max mem: 24440
[17:57:36.027658] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3519 (1.2000)  acc1: 65.6250 (70.2516)  acc5: 90.6250 (91.2556)  time: 0.1681  data: 0.0002  max mem: 24440
[17:59:00.158239] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9044 (1.3007)  acc1: 81.2500 (68.2212)  acc5: 90.6250 (89.7423)  time: 0.1678  data: 0.0002  max mem: 24440
[17:59:10.480254] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8879 (1.2992)  acc1: 78.1250 (68.2280)  acc5: 93.7500 (89.7620)  time: 0.1635  data: 0.0001  max mem: 24440
[17:59:10.593293] Test: Total time: 0:04:24 (0.1690 s / it)
[17:59:11.002414] * Acc@1 68.233 Acc@5 89.763 loss 1.299
[17:59:11.002563] Accuracy of the network on the 50000 test images: 68.2%
[17:59:11.002586] Max accuracy: 68.23%
[17:59:11.077415] log_dir: ./output_dir_qkformer
[17:59:13.817004] Epoch: [38]  [   0/5004]  eta: 3:48:19  lr: 0.001117  loss: 3.3117 (3.3117)  time: 2.7377  data: 2.0291  max mem: 24440
[18:14:34.797755] Epoch: [38]  [2000/5004]  eta: 0:23:06  lr: 0.001115  loss: 3.1840 (3.1217)  time: 0.4646  data: 0.0002  max mem: 24440
[18:29:56.823496] Epoch: [38]  [4000/5004]  eta: 0:07:43  lr: 0.001113  loss: 3.1002 (3.1210)  time: 0.4583  data: 0.0002  max mem: 24440
[18:37:39.011451] Epoch: [38]  [5003/5004]  eta: 0:00:00  lr: 0.001112  loss: 3.0149 (3.1231)  time: 0.4565  data: 0.0007  max mem: 24440
[18:37:39.503012] Epoch: [38] Total time: 0:38:28 (0.4613 s / it)
[18:37:39.536758] Averaged stats: lr: 0.001112  loss: 3.0149 (3.1257)
[18:37:42.120417] Test:  [   0/1563]  eta: 1:07:06  loss: 0.3214 (0.3214)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.5764  data: 2.3901  max mem: 24440
[18:39:06.110616] Test:  [ 500/1563]  eta: 0:03:03  loss: 1.3735 (1.0956)  acc1: 65.6250 (73.0788)  acc5: 93.7500 (93.3134)  time: 0.1678  data: 0.0002  max mem: 24440
[18:40:30.051773] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4966 (1.2434)  acc1: 59.3750 (70.4951)  acc5: 90.6250 (90.8716)  time: 0.1679  data: 0.0002  max mem: 24440
[18:41:54.050206] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9402 (1.3583)  acc1: 81.2500 (68.0775)  acc5: 93.7500 (89.1822)  time: 0.1679  data: 0.0002  max mem: 24440
[18:42:04.375743] Test:  [1562/1563]  eta: 0:00:00  loss: 0.9806 (1.3557)  acc1: 78.1250 (68.1060)  acc5: 96.8750 (89.2660)  time: 0.1635  data: 0.0001  max mem: 24440
[18:42:04.503555] Test: Total time: 0:04:24 (0.1695 s / it)
[18:42:04.998493] * Acc@1 68.105 Acc@5 89.265 loss 1.356
[18:42:04.998642] Accuracy of the network on the 50000 test images: 68.1%
[18:42:04.998664] Max accuracy: 68.23%
[18:42:05.099217] log_dir: ./output_dir_qkformer
[18:42:07.848396] Epoch: [39]  [   0/5004]  eta: 3:49:02  lr: 0.001112  loss: 3.3114 (3.3114)  time: 2.7463  data: 2.2480  max mem: 24440
[18:57:30.594179] Epoch: [39]  [2000/5004]  eta: 0:23:09  lr: 0.001110  loss: 3.1522 (3.1223)  time: 0.4578  data: 0.0003  max mem: 24440
[19:12:52.465383] Epoch: [39]  [4000/5004]  eta: 0:07:43  lr: 0.001108  loss: 2.9581 (3.1256)  time: 0.4645  data: 0.0003  max mem: 24440
[19:20:34.480308] Epoch: [39]  [5003/5004]  eta: 0:00:00  lr: 0.001107  loss: 3.2081 (3.1242)  time: 0.4545  data: 0.0005  max mem: 24440
[19:20:34.920605] Epoch: [39] Total time: 0:38:29 (0.4616 s / it)
[19:20:34.926653] Averaged stats: lr: 0.001107  loss: 3.2081 (3.1230)
[19:20:36.855603] Test:  [   0/1563]  eta: 0:50:03  loss: 0.7781 (0.7781)  acc1: 84.3750 (84.3750)  acc5: 93.7500 (93.7500)  time: 1.9218  data: 1.7488  max mem: 24440
[19:22:00.903722] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.1056 (1.0546)  acc1: 68.7500 (72.3802)  acc5: 93.7500 (93.7500)  time: 0.1681  data: 0.0002  max mem: 24440
[19:23:24.888867] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2529 (1.2219)  acc1: 65.6250 (69.5242)  acc5: 90.6250 (90.9903)  time: 0.1681  data: 0.0002  max mem: 24440
[19:24:51.550058] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6891 (1.3273)  acc1: 84.3750 (67.5820)  acc5: 96.8750 (89.3571)  time: 0.1680  data: 0.0002  max mem: 24440
[19:25:01.877142] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5119 (1.3306)  acc1: 90.6250 (67.5380)  acc5: 96.8750 (89.3040)  time: 0.1637  data: 0.0001  max mem: 24440
[19:25:01.988222] Test: Total time: 0:04:27 (0.1709 s / it)
[19:25:02.429905] * Acc@1 67.540 Acc@5 89.302 loss 1.331
[19:25:02.430162] Accuracy of the network on the 50000 test images: 67.5%
[19:25:02.430216] Max accuracy: 68.23%
[19:25:02.654961] log_dir: ./output_dir_qkformer
[19:25:06.709545] Epoch: [40]  [   0/5004]  eta: 5:38:00  lr: 0.001107  loss: 3.5059 (3.5059)  time: 4.0529  data: 2.4354  max mem: 24440
[19:40:27.425485] Epoch: [40]  [2000/5004]  eta: 0:23:08  lr: 0.001105  loss: 3.1892 (3.0996)  time: 0.4646  data: 0.0003  max mem: 24440
[19:55:47.854107] Epoch: [40]  [4000/5004]  eta: 0:07:43  lr: 0.001103  loss: 2.9271 (3.1091)  time: 0.4571  data: 0.0002  max mem: 24440
[20:03:29.399187] Epoch: [40]  [5003/5004]  eta: 0:00:00  lr: 0.001102  loss: 3.0241 (3.1180)  time: 0.4539  data: 0.0010  max mem: 24440
[20:03:29.930003] Epoch: [40] Total time: 0:38:27 (0.4611 s / it)
[20:03:29.942096] Averaged stats: lr: 0.001102  loss: 3.0241 (3.1146)
[20:03:32.697699] Test:  [   0/1563]  eta: 1:11:31  loss: 0.7357 (0.7357)  acc1: 84.3750 (84.3750)  acc5: 93.7500 (93.7500)  time: 2.7459  data: 2.3032  max mem: 24440
[20:04:58.027289] Test:  [ 500/1563]  eta: 0:03:06  loss: 0.9890 (1.0814)  acc1: 65.6250 (71.6629)  acc5: 93.7500 (93.0639)  time: 0.1678  data: 0.0002  max mem: 24440
[20:06:22.220473] Test:  [1000/1563]  eta: 0:01:36  loss: 2.0570 (1.2524)  acc1: 43.7500 (68.5814)  acc5: 84.3750 (90.5313)  time: 0.1678  data: 0.0002  max mem: 24440
[20:07:46.175400] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8063 (1.3464)  acc1: 81.2500 (66.9241)  acc5: 96.8750 (89.0344)  time: 0.1678  data: 0.0002  max mem: 24440
[20:07:56.499856] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5466 (1.3471)  acc1: 87.5000 (66.9760)  acc5: 96.8750 (89.0180)  time: 0.1635  data: 0.0001  max mem: 24440
[20:07:56.627268] Test: Total time: 0:04:26 (0.1706 s / it)
[20:07:57.254294] * Acc@1 66.978 Acc@5 89.018 loss 1.347
[20:07:57.254517] Accuracy of the network on the 50000 test images: 67.0%
[20:07:57.254557] Max accuracy: 68.23%
[20:07:57.406472] log_dir: ./output_dir_qkformer
[20:07:59.999049] Epoch: [41]  [   0/5004]  eta: 3:35:59  lr: 0.001102  loss: 2.9635 (2.9635)  time: 2.5899  data: 2.0500  max mem: 24440
[20:23:21.805856] Epoch: [41]  [2000/5004]  eta: 0:23:07  lr: 0.001100  loss: 2.9616 (3.0797)  time: 0.4624  data: 0.0002  max mem: 24440
[20:38:43.110586] Epoch: [41]  [4000/5004]  eta: 0:07:43  lr: 0.001098  loss: 3.0699 (3.0947)  time: 0.4592  data: 0.0002  max mem: 24440
[20:46:25.470036] Epoch: [41]  [5003/5004]  eta: 0:00:00  lr: 0.001097  loss: 3.0961 (3.0993)  time: 0.4535  data: 0.0006  max mem: 24440
[20:46:25.969528] Epoch: [41] Total time: 0:38:28 (0.4613 s / it)
[20:46:26.008583] Averaged stats: lr: 0.001097  loss: 3.0961 (3.1057)
[20:46:28.297283] Test:  [   0/1563]  eta: 0:59:27  loss: 0.5899 (0.5899)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 2.2824  data: 1.9807  max mem: 24440
[20:47:52.321315] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.8240 (1.0593)  acc1: 78.1250 (73.2161)  acc5: 93.7500 (93.8061)  time: 0.1683  data: 0.0006  max mem: 24440
[20:49:16.350655] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2895 (1.2133)  acc1: 65.6250 (70.4733)  acc5: 90.6250 (91.3586)  time: 0.1679  data: 0.0002  max mem: 24440
[20:50:40.357113] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7692 (1.3074)  acc1: 81.2500 (68.6397)  acc5: 93.7500 (89.8880)  time: 0.1679  data: 0.0002  max mem: 24440
[20:50:50.679487] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6023 (1.3102)  acc1: 84.3750 (68.6120)  acc5: 100.0000 (89.8400)  time: 0.1635  data: 0.0001  max mem: 24440
[20:50:50.792358] Test: Total time: 0:04:24 (0.1694 s / it)
[20:50:51.168246] * Acc@1 68.614 Acc@5 89.841 loss 1.310
[20:50:51.168432] Accuracy of the network on the 50000 test images: 68.6%
[20:50:51.168460] Max accuracy: 68.61%
[20:50:51.254996] log_dir: ./output_dir_qkformer
[20:50:53.878039] Epoch: [42]  [   0/5004]  eta: 3:38:38  lr: 0.001097  loss: 2.8615 (2.8615)  time: 2.6216  data: 2.1039  max mem: 24440
[21:06:15.255217] Epoch: [42]  [2000/5004]  eta: 0:23:06  lr: 0.001094  loss: 2.9595 (3.0908)  time: 0.4595  data: 0.0002  max mem: 24440
[21:21:36.180171] Epoch: [42]  [4000/5004]  eta: 0:07:42  lr: 0.001092  loss: 2.9676 (3.0973)  time: 0.4636  data: 0.0003  max mem: 24440
[21:29:17.839484] Epoch: [42]  [5003/5004]  eta: 0:00:00  lr: 0.001091  loss: 3.1266 (3.1005)  time: 0.4536  data: 0.0005  max mem: 24440
[21:29:18.254947] Epoch: [42] Total time: 0:38:26 (0.4610 s / it)
[21:29:18.326456] Averaged stats: lr: 0.001091  loss: 3.1266 (3.0992)
[21:29:20.563788] Test:  [   0/1563]  eta: 0:58:07  loss: 0.4266 (0.4266)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.2316  data: 1.8927  max mem: 24440
[21:30:44.542028] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9827 (1.0580)  acc1: 75.0000 (73.1911)  acc5: 96.8750 (93.6502)  time: 0.1679  data: 0.0002  max mem: 24440
[21:32:08.528632] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4419 (1.2214)  acc1: 53.1250 (69.9863)  acc5: 93.7500 (90.9715)  time: 0.1680  data: 0.0002  max mem: 24440
[21:33:32.496171] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6603 (1.2970)  acc1: 84.3750 (68.6063)  acc5: 93.7500 (89.7943)  time: 0.1679  data: 0.0002  max mem: 24440
[21:33:42.820783] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5863 (1.2920)  acc1: 84.3750 (68.7260)  acc5: 96.8750 (89.8800)  time: 0.1635  data: 0.0001  max mem: 24440
[21:33:42.936321] Test: Total time: 0:04:24 (0.1693 s / it)
[21:33:43.557621] * Acc@1 68.727 Acc@5 89.885 loss 1.292
[21:33:43.557769] Accuracy of the network on the 50000 test images: 68.7%
[21:33:43.557790] Max accuracy: 68.73%
[21:33:43.691134] log_dir: ./output_dir_qkformer
[21:33:46.493651] Epoch: [43]  [   0/5004]  eta: 3:53:36  lr: 0.001091  loss: 3.0376 (3.0376)  time: 2.8010  data: 2.2581  max mem: 24440
[21:49:07.114036] Epoch: [43]  [2000/5004]  eta: 0:23:06  lr: 0.001089  loss: 2.8973 (3.0869)  time: 0.4627  data: 0.0002  max mem: 24440
[22:04:27.200355] Epoch: [43]  [4000/5004]  eta: 0:07:42  lr: 0.001087  loss: 3.0735 (3.0917)  time: 0.4616  data: 0.0002  max mem: 24440
[22:12:08.594192] Epoch: [43]  [5003/5004]  eta: 0:00:00  lr: 0.001086  loss: 3.0475 (3.0921)  time: 0.4535  data: 0.0005  max mem: 24440
[22:12:09.061055] Epoch: [43] Total time: 0:38:25 (0.4607 s / it)
[22:12:09.061957] Averaged stats: lr: 0.001086  loss: 3.0475 (3.0932)
[22:12:11.450744] Test:  [   0/1563]  eta: 1:02:08  loss: 0.3808 (0.3808)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 2.3852  data: 2.1929  max mem: 24440
[22:13:35.511032] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.9865 (1.0924)  acc1: 75.0000 (72.1869)  acc5: 93.7500 (93.0888)  time: 0.1679  data: 0.0002  max mem: 24440
[22:14:59.550681] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2704 (1.2354)  acc1: 56.2500 (69.6210)  acc5: 90.6250 (90.7093)  time: 0.1678  data: 0.0002  max mem: 24440
[22:16:24.087823] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9738 (1.3297)  acc1: 78.1250 (67.7423)  acc5: 93.7500 (89.4508)  time: 0.1683  data: 0.0002  max mem: 24440
[22:16:34.423927] Test:  [1562/1563]  eta: 0:00:00  loss: 0.8652 (1.3304)  acc1: 81.2500 (67.7780)  acc5: 93.7500 (89.4580)  time: 0.1638  data: 0.0001  max mem: 24440
[22:16:34.530549] Test: Total time: 0:04:25 (0.1698 s / it)
[22:16:34.786712] * Acc@1 67.778 Acc@5 89.457 loss 1.330
[22:16:34.786863] Accuracy of the network on the 50000 test images: 67.8%
[22:16:34.786883] Max accuracy: 68.73%
[22:16:34.906022] log_dir: ./output_dir_qkformer
[22:16:37.576548] Epoch: [44]  [   0/5004]  eta: 3:42:36  lr: 0.001086  loss: 2.7672 (2.7672)  time: 2.6693  data: 2.1548  max mem: 24440
[22:31:57.664731] Epoch: [44]  [2000/5004]  eta: 0:23:05  lr: 0.001083  loss: 3.1444 (3.0746)  time: 0.4592  data: 0.0002  max mem: 24440
[22:47:17.234813] Epoch: [44]  [4000/5004]  eta: 0:07:42  lr: 0.001081  loss: 3.0722 (3.0786)  time: 0.4580  data: 0.0002  max mem: 24440
[22:54:57.904343] Epoch: [44]  [5003/5004]  eta: 0:00:00  lr: 0.001080  loss: 3.1126 (3.0790)  time: 0.4530  data: 0.0008  max mem: 24440
[22:54:58.297075] Epoch: [44] Total time: 0:38:23 (0.4603 s / it)
[22:54:58.299134] Averaged stats: lr: 0.001080  loss: 3.1126 (3.0863)
[22:55:00.111005] Test:  [   0/1563]  eta: 0:47:05  loss: 0.5056 (0.5056)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.8080  data: 1.6320  max mem: 24440
[22:56:24.043140] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0558 (1.0153)  acc1: 75.0000 (74.4573)  acc5: 93.7500 (93.9496)  time: 0.1678  data: 0.0002  max mem: 24440
[22:57:48.017530] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2829 (1.1812)  acc1: 71.8750 (71.2350)  acc5: 90.6250 (91.3774)  time: 0.1680  data: 0.0002  max mem: 24440
[22:59:11.983105] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6597 (1.2909)  acc1: 84.3750 (68.8187)  acc5: 93.7500 (89.7131)  time: 0.1678  data: 0.0002  max mem: 24440
[22:59:22.308850] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6127 (1.2929)  acc1: 84.3750 (68.7960)  acc5: 96.8750 (89.6980)  time: 0.1634  data: 0.0001  max mem: 24440
[22:59:22.431553] Test: Total time: 0:04:24 (0.1690 s / it)
[22:59:22.915777] * Acc@1 68.794 Acc@5 89.699 loss 1.293
[22:59:22.915966] Accuracy of the network on the 50000 test images: 68.8%
[22:59:22.915989] Max accuracy: 68.79%
[22:59:22.990211] log_dir: ./output_dir_qkformer
[22:59:25.658848] Epoch: [45]  [   0/5004]  eta: 3:42:27  lr: 0.001080  loss: 2.7460 (2.7460)  time: 2.6674  data: 2.1541  max mem: 24440
[23:14:48.517928] Epoch: [45]  [2000/5004]  eta: 0:23:09  lr: 0.001077  loss: 3.1577 (3.0704)  time: 0.4645  data: 0.0002  max mem: 24440
[23:30:11.468795] Epoch: [45]  [4000/5004]  eta: 0:07:43  lr: 0.001075  loss: 3.1694 (3.0758)  time: 0.4593  data: 0.0003  max mem: 24440
[23:37:53.465470] Epoch: [45]  [5003/5004]  eta: 0:00:00  lr: 0.001074  loss: 3.0168 (3.0801)  time: 0.4543  data: 0.0008  max mem: 24440
[23:37:53.941636] Epoch: [45] Total time: 0:38:30 (0.4618 s / it)
[23:37:53.945774] Averaged stats: lr: 0.001074  loss: 3.0168 (3.0796)
[23:37:55.660614] Test:  [   0/1563]  eta: 0:44:33  loss: 0.5829 (0.5829)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.7103  data: 1.5357  max mem: 24440
[23:39:19.611353] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0980 (1.0575)  acc1: 75.0000 (73.3845)  acc5: 93.7500 (93.7188)  time: 0.1677  data: 0.0002  max mem: 24440
[23:40:43.559903] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5778 (1.2186)  acc1: 56.2500 (70.0737)  acc5: 84.3750 (91.2712)  time: 0.1678  data: 0.0002  max mem: 24440
[23:42:07.499212] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7589 (1.3212)  acc1: 78.1250 (68.2253)  acc5: 93.7500 (89.7485)  time: 0.1677  data: 0.0002  max mem: 24440
[23:42:17.816033] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6430 (1.3212)  acc1: 87.5000 (68.2080)  acc5: 96.8750 (89.7820)  time: 0.1634  data: 0.0001  max mem: 24440
[23:42:17.959140] Test: Total time: 0:04:24 (0.1689 s / it)
[23:42:18.609768] * Acc@1 68.206 Acc@5 89.784 loss 1.321
[23:42:18.610089] Accuracy of the network on the 50000 test images: 68.2%
[23:42:18.610145] Max accuracy: 68.79%
[23:42:18.735600] log_dir: ./output_dir_qkformer
[23:42:21.610311] Epoch: [46]  [   0/5004]  eta: 3:59:30  lr: 0.001074  loss: 2.6759 (2.6759)  time: 2.8718  data: 2.3576  max mem: 24440
[23:57:41.684597] Epoch: [46]  [2000/5004]  eta: 0:23:05  lr: 0.001072  loss: 3.0951 (3.0753)  time: 0.4599  data: 0.0002  max mem: 24440
[00:13:02.072347] Epoch: [46]  [4000/5004]  eta: 0:07:42  lr: 0.001069  loss: 3.0345 (3.0765)  time: 0.4577  data: 0.0002  max mem: 24440
[00:20:42.965809] Epoch: [46]  [5003/5004]  eta: 0:00:00  lr: 0.001068  loss: 3.0741 (3.0817)  time: 0.4559  data: 0.0009  max mem: 24440
[00:20:43.413599] Epoch: [46] Total time: 0:38:24 (0.4606 s / it)
[00:20:43.428618] Averaged stats: lr: 0.001068  loss: 3.0741 (3.0765)
[00:20:45.486049] Test:  [   0/1563]  eta: 0:53:28  loss: 0.5612 (0.5612)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 2.0528  data: 1.7812  max mem: 24440
[00:22:09.493716] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9298 (0.9920)  acc1: 75.0000 (75.1123)  acc5: 96.8750 (94.1679)  time: 0.1685  data: 0.0002  max mem: 24440
[00:23:33.476561] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4480 (1.1550)  acc1: 59.3750 (71.6034)  acc5: 90.6250 (91.7676)  time: 0.1679  data: 0.0002  max mem: 24440
[00:24:57.455825] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6827 (1.2626)  acc1: 87.5000 (69.2684)  acc5: 93.7500 (90.1732)  time: 0.1680  data: 0.0002  max mem: 24440
[00:25:07.782448] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7894 (1.2656)  acc1: 81.2500 (69.1800)  acc5: 93.7500 (90.1340)  time: 0.1636  data: 0.0001  max mem: 24440
[00:25:07.904138] Test: Total time: 0:04:24 (0.1692 s / it)
[00:25:08.451163] * Acc@1 69.180 Acc@5 90.133 loss 1.266
[00:25:08.451463] Accuracy of the network on the 50000 test images: 69.2%
[00:25:08.451521] Max accuracy: 69.18%
[00:25:08.548472] log_dir: ./output_dir_qkformer
[00:25:11.406569] Epoch: [47]  [   0/5004]  eta: 3:58:13  lr: 0.001068  loss: 3.7756 (3.7756)  time: 2.8565  data: 2.3874  max mem: 24440
[00:40:32.412962] Epoch: [47]  [2000/5004]  eta: 0:23:06  lr: 0.001065  loss: 3.0395 (3.0619)  time: 0.4604  data: 0.0003  max mem: 24440
[00:55:53.144409] Epoch: [47]  [4000/5004]  eta: 0:07:42  lr: 0.001063  loss: 3.0505 (3.0736)  time: 0.4631  data: 0.0002  max mem: 24440
[01:03:34.902628] Epoch: [47]  [5003/5004]  eta: 0:00:00  lr: 0.001062  loss: 3.1604 (3.0754)  time: 0.4591  data: 0.0010  max mem: 24440
[01:03:35.434479] Epoch: [47] Total time: 0:38:26 (0.4610 s / it)
[01:03:35.439266] Averaged stats: lr: 0.001062  loss: 3.1604 (3.0698)
[01:03:36.958544] Test:  [   0/1563]  eta: 0:39:26  loss: 0.4268 (0.4268)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5143  data: 1.3354  max mem: 24440
[01:05:00.964567] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.2887 (1.0468)  acc1: 65.6250 (73.3658)  acc5: 93.7500 (93.4506)  time: 0.1684  data: 0.0005  max mem: 24440
[01:06:24.937423] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3851 (1.2053)  acc1: 59.3750 (70.5232)  acc5: 93.7500 (91.0870)  time: 0.1677  data: 0.0002  max mem: 24440
[01:07:48.874357] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6455 (1.3003)  acc1: 81.2500 (68.7583)  acc5: 93.7500 (89.7235)  time: 0.1678  data: 0.0002  max mem: 24440
[01:07:59.191549] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7680 (1.2999)  acc1: 84.3750 (68.7800)  acc5: 96.8750 (89.7640)  time: 0.1634  data: 0.0001  max mem: 24440
[01:07:59.312687] Test: Total time: 0:04:23 (0.1688 s / it)
[01:07:59.898455] * Acc@1 68.780 Acc@5 89.763 loss 1.300
[01:07:59.898604] Accuracy of the network on the 50000 test images: 68.8%
[01:07:59.898627] Max accuracy: 69.18%
[01:08:00.041083] log_dir: ./output_dir_qkformer
[01:08:02.809927] Epoch: [48]  [   0/5004]  eta: 3:50:49  lr: 0.001062  loss: 2.7101 (2.7101)  time: 2.7676  data: 2.2920  max mem: 24440
[01:23:22.910262] Epoch: [48]  [2000/5004]  eta: 0:23:05  lr: 0.001059  loss: 3.0369 (3.0453)  time: 0.4619  data: 0.0002  max mem: 24440
[01:38:42.887117] Epoch: [48]  [4000/5004]  eta: 0:07:42  lr: 0.001057  loss: 2.9176 (3.0566)  time: 0.4592  data: 0.0002  max mem: 24440
[01:46:24.437036] Epoch: [48]  [5003/5004]  eta: 0:00:00  lr: 0.001056  loss: 3.2083 (3.0585)  time: 0.4537  data: 0.0009  max mem: 24440
[01:46:24.839682] Epoch: [48] Total time: 0:38:24 (0.4606 s / it)
[01:46:24.892200] Averaged stats: lr: 0.001056  loss: 3.2083 (3.0614)
[01:46:26.930332] Test:  [   0/1563]  eta: 0:52:53  loss: 0.1874 (0.1874)  acc1: 100.0000 (100.0000)  acc5: 100.0000 (100.0000)  time: 2.0303  data: 1.7186  max mem: 24440
[01:47:50.876924] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9646 (0.9991)  acc1: 78.1250 (74.9563)  acc5: 93.7500 (94.4923)  time: 0.1677  data: 0.0002  max mem: 24440
[01:49:14.824566] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3202 (1.1716)  acc1: 71.8750 (71.3380)  acc5: 93.7500 (91.7770)  time: 0.1678  data: 0.0002  max mem: 24440
[01:50:38.759927] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8846 (1.2723)  acc1: 78.1250 (69.4620)  acc5: 93.7500 (90.2607)  time: 0.1678  data: 0.0002  max mem: 24440
[01:50:49.088546] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6996 (1.2752)  acc1: 87.5000 (69.4200)  acc5: 96.8750 (90.2360)  time: 0.1635  data: 0.0001  max mem: 24440
[01:50:49.195336] Test: Total time: 0:04:24 (0.1691 s / it)
[01:50:49.766063] * Acc@1 69.415 Acc@5 90.232 loss 1.275
[01:50:49.766216] Accuracy of the network on the 50000 test images: 69.4%
[01:50:49.766239] Max accuracy: 69.42%
[01:50:49.872130] log_dir: ./output_dir_qkformer
[01:50:52.641178] Epoch: [49]  [   0/5004]  eta: 3:50:53  lr: 0.001056  loss: 2.3303 (2.3303)  time: 2.7684  data: 2.2667  max mem: 24440
[02:06:14.635660] Epoch: [49]  [2000/5004]  eta: 0:23:08  lr: 0.001053  loss: 2.8938 (3.0476)  time: 0.4615  data: 0.0002  max mem: 24440
[02:21:35.644643] Epoch: [49]  [4000/5004]  eta: 0:07:43  lr: 0.001051  loss: 3.0766 (3.0562)  time: 0.4621  data: 0.0002  max mem: 24440
[02:29:17.330065] Epoch: [49]  [5003/5004]  eta: 0:00:00  lr: 0.001049  loss: 2.9715 (3.0567)  time: 0.4528  data: 0.0009  max mem: 24440
[02:29:17.779436] Epoch: [49] Total time: 0:38:27 (0.4612 s / it)
[02:29:17.806258] Averaged stats: lr: 0.001049  loss: 2.9715 (3.0549)
[02:29:20.143021] Test:  [   0/1563]  eta: 1:00:44  loss: 0.4234 (0.4234)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.3317  data: 1.9620  max mem: 24440
[02:30:44.139032] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.9719 (1.0166)  acc1: 71.8750 (73.4281)  acc5: 93.7500 (94.0619)  time: 0.1681  data: 0.0002  max mem: 24440
[02:32:08.094913] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2964 (1.1686)  acc1: 68.7500 (70.5763)  acc5: 90.6250 (91.7333)  time: 0.1679  data: 0.0004  max mem: 24440
[02:33:32.042794] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6173 (1.2701)  acc1: 87.5000 (68.8645)  acc5: 93.7500 (90.1066)  time: 0.1678  data: 0.0002  max mem: 24440
[02:33:42.370637] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5989 (1.2684)  acc1: 87.5000 (68.9520)  acc5: 93.7500 (90.1200)  time: 0.1635  data: 0.0001  max mem: 24440
[02:33:42.473484] Test: Total time: 0:04:24 (0.1693 s / it)
[02:33:43.100427] * Acc@1 68.954 Acc@5 90.121 loss 1.268
[02:33:43.100576] Accuracy of the network on the 50000 test images: 69.0%
[02:33:43.100598] Max accuracy: 69.42%
[02:33:43.174660] log_dir: ./output_dir_qkformer
[02:33:45.932817] Epoch: [50]  [   0/5004]  eta: 3:49:51  lr: 0.001049  loss: 3.1176 (3.1176)  time: 2.7562  data: 2.2765  max mem: 24440
[02:49:08.485533] Epoch: [50]  [2000/5004]  eta: 0:23:09  lr: 0.001047  loss: 2.9823 (3.0369)  time: 0.4608  data: 0.0002  max mem: 24440
[03:04:30.056171] Epoch: [50]  [4000/5004]  eta: 0:07:43  lr: 0.001044  loss: 3.1855 (3.0512)  time: 0.4568  data: 0.0002  max mem: 24440
[03:12:11.774545] Epoch: [50]  [5003/5004]  eta: 0:00:00  lr: 0.001043  loss: 3.1789 (3.0551)  time: 0.4533  data: 0.0005  max mem: 24440
[03:12:12.221015] Epoch: [50] Total time: 0:38:29 (0.4614 s / it)
[03:12:12.240345] Averaged stats: lr: 0.001043  loss: 3.1789 (3.0511)
[03:12:14.846300] Test:  [   0/1563]  eta: 1:07:41  loss: 0.3954 (0.3954)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.5988  data: 2.3393  max mem: 24440
[03:13:38.866889] Test:  [ 500/1563]  eta: 0:03:03  loss: 1.2331 (1.0507)  acc1: 65.6250 (73.7712)  acc5: 93.7500 (94.0182)  time: 0.1681  data: 0.0002  max mem: 24440
[03:15:02.920212] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4255 (1.1829)  acc1: 56.2500 (71.0571)  acc5: 93.7500 (91.8113)  time: 0.1680  data: 0.0002  max mem: 24440
[03:16:26.950990] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6514 (1.2806)  acc1: 84.3750 (69.1726)  acc5: 96.8750 (90.2856)  time: 0.1683  data: 0.0002  max mem: 24440
[03:16:37.273542] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6707 (1.2806)  acc1: 87.5000 (69.1860)  acc5: 93.7500 (90.2500)  time: 0.1636  data: 0.0001  max mem: 24440
[03:16:37.408924] Test: Total time: 0:04:25 (0.1697 s / it)
[03:16:37.837926] * Acc@1 69.183 Acc@5 90.251 loss 1.281
[03:16:37.838103] Accuracy of the network on the 50000 test images: 69.2%
[03:16:37.838129] Max accuracy: 69.42%
[03:16:37.908631] log_dir: ./output_dir_qkformer
[03:16:40.603436] Epoch: [51]  [   0/5004]  eta: 3:44:35  lr: 0.001043  loss: 2.9639 (2.9639)  time: 2.6930  data: 2.1239  max mem: 24440
[03:32:03.563356] Epoch: [51]  [2000/5004]  eta: 0:23:09  lr: 0.001040  loss: 3.0827 (3.0410)  time: 0.4578  data: 0.0002  max mem: 24440
[03:47:26.232571] Epoch: [51]  [4000/5004]  eta: 0:07:43  lr: 0.001038  loss: 2.9033 (3.0381)  time: 0.4654  data: 0.0003  max mem: 24440
[03:55:08.446529] Epoch: [51]  [5003/5004]  eta: 0:00:00  lr: 0.001036  loss: 2.8792 (3.0383)  time: 0.4579  data: 0.0008  max mem: 24440
[03:55:08.736781] Epoch: [51] Total time: 0:38:30 (0.4618 s / it)
[03:55:08.920803] Averaged stats: lr: 0.001036  loss: 2.8792 (3.0445)
[03:55:10.985391] Test:  [   0/1563]  eta: 0:53:41  loss: 0.3723 (0.3723)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.0611  data: 1.8205  max mem: 24440
[03:56:34.988267] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0891 (1.0668)  acc1: 78.1250 (74.3950)  acc5: 93.7500 (93.7937)  time: 0.1678  data: 0.0002  max mem: 24440
[03:57:58.939534] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2584 (1.2193)  acc1: 68.7500 (71.0383)  acc5: 93.7500 (91.3087)  time: 0.1677  data: 0.0002  max mem: 24440
[03:59:22.912006] Test:  [1500/1563]  eta: 0:00:10  loss: 0.9283 (1.3052)  acc1: 84.3750 (69.2580)  acc5: 93.7500 (89.9088)  time: 0.1683  data: 0.0002  max mem: 24440
[03:59:33.228008] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6122 (1.3033)  acc1: 87.5000 (69.2600)  acc5: 96.8750 (89.9160)  time: 0.1635  data: 0.0001  max mem: 24440
[03:59:33.337686] Test: Total time: 0:04:24 (0.1692 s / it)
[03:59:34.061650] * Acc@1 69.260 Acc@5 89.917 loss 1.303
[03:59:34.061809] Accuracy of the network on the 50000 test images: 69.3%
[03:59:34.061831] Max accuracy: 69.42%
[03:59:34.112343] log_dir: ./output_dir_qkformer
[03:59:36.843577] Epoch: [52]  [   0/5004]  eta: 3:47:43  lr: 0.001036  loss: 2.7342 (2.7342)  time: 2.7304  data: 2.2533  max mem: 24440
[04:14:57.340008] Epoch: [52]  [2000/5004]  eta: 0:23:05  lr: 0.001034  loss: 3.0728 (3.0259)  time: 0.4599  data: 0.0002  max mem: 24440
[04:30:17.316893] Epoch: [52]  [4000/5004]  eta: 0:07:42  lr: 0.001031  loss: 3.2591 (3.0375)  time: 0.4575  data: 0.0002  max mem: 24440
[04:37:58.540538] Epoch: [52]  [5003/5004]  eta: 0:00:00  lr: 0.001030  loss: 2.9979 (3.0374)  time: 0.4539  data: 0.0008  max mem: 24440
[04:37:58.995650] Epoch: [52] Total time: 0:38:24 (0.4606 s / it)
[04:37:58.997544] Averaged stats: lr: 0.001030  loss: 2.9979 (3.0413)
[04:38:01.720129] Test:  [   0/1563]  eta: 1:10:50  loss: 0.6437 (0.6437)  acc1: 84.3750 (84.3750)  acc5: 96.8750 (96.8750)  time: 2.7191  data: 2.5145  max mem: 24440
[04:39:25.666219] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.8810 (0.9792)  acc1: 71.8750 (75.2682)  acc5: 93.7500 (94.4548)  time: 0.1678  data: 0.0002  max mem: 24440
[04:40:49.639836] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1043 (1.1189)  acc1: 71.8750 (72.4994)  acc5: 90.6250 (92.0267)  time: 0.1680  data: 0.0002  max mem: 24440
[04:42:13.595005] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7757 (1.2297)  acc1: 81.2500 (70.0429)  acc5: 93.7500 (90.4126)  time: 0.1679  data: 0.0002  max mem: 24440
[04:42:23.926092] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5449 (1.2301)  acc1: 87.5000 (70.0040)  acc5: 96.8750 (90.3940)  time: 0.1635  data: 0.0001  max mem: 24440
[04:42:24.048659] Test: Total time: 0:04:25 (0.1696 s / it)
[04:42:24.682601] * Acc@1 70.001 Acc@5 90.401 loss 1.230
[04:42:24.682765] Accuracy of the network on the 50000 test images: 70.0%
[04:42:24.682786] Max accuracy: 70.00%
[04:42:24.712158] log_dir: ./output_dir_qkformer
[04:42:27.406863] Epoch: [53]  [   0/5004]  eta: 3:44:30  lr: 0.001030  loss: 3.3525 (3.3525)  time: 2.6919  data: 2.2280  max mem: 24440
[04:57:49.038229] Epoch: [53]  [2000/5004]  eta: 0:23:07  lr: 0.001027  loss: 2.9163 (3.0302)  time: 0.4606  data: 0.0002  max mem: 24440
[05:13:09.808995] Epoch: [53]  [4000/5004]  eta: 0:07:42  lr: 0.001024  loss: 3.2090 (3.0337)  time: 0.4577  data: 0.0002  max mem: 24440
[05:20:51.446212] Epoch: [53]  [5003/5004]  eta: 0:00:00  lr: 0.001023  loss: 3.0087 (3.0353)  time: 0.4544  data: 0.0005  max mem: 24440
[05:20:51.876166] Epoch: [53] Total time: 0:38:27 (0.4611 s / it)
[05:20:51.878677] Averaged stats: lr: 0.001023  loss: 3.0087 (3.0348)
[05:20:53.570501] Test:  [   0/1563]  eta: 0:43:55  loss: 0.4625 (0.4625)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.6863  data: 1.5104  max mem: 24440
[05:22:17.739887] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9893 (1.0148)  acc1: 75.0000 (74.3575)  acc5: 93.7500 (94.4174)  time: 0.1678  data: 0.0002  max mem: 24440
[05:23:41.714893] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4955 (1.1659)  acc1: 62.5000 (71.2537)  acc5: 90.6250 (91.8613)  time: 0.1677  data: 0.0002  max mem: 24440
[05:25:05.674915] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7317 (1.2587)  acc1: 87.5000 (69.4808)  acc5: 93.7500 (90.5084)  time: 0.1677  data: 0.0002  max mem: 24440
[05:25:15.996832] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7752 (1.2609)  acc1: 81.2500 (69.4340)  acc5: 96.8750 (90.5040)  time: 0.1635  data: 0.0001  max mem: 24440
[05:25:16.129278] Test: Total time: 0:04:24 (0.1691 s / it)
[05:25:16.783230] * Acc@1 69.429 Acc@5 90.504 loss 1.261
[05:25:16.783384] Accuracy of the network on the 50000 test images: 69.4%
[05:25:16.783408] Max accuracy: 70.00%
[05:25:16.873844] log_dir: ./output_dir_qkformer
[05:25:19.543785] Epoch: [54]  [   0/5004]  eta: 3:42:37  lr: 0.001023  loss: 3.7762 (3.7762)  time: 2.6693  data: 2.1936  max mem: 24440
[05:40:40.962265] Epoch: [54]  [2000/5004]  eta: 0:23:07  lr: 0.001020  loss: 3.0491 (3.0325)  time: 0.4624  data: 0.0002  max mem: 24440
[05:56:08.371036] Epoch: [54]  [4000/5004]  eta: 0:07:44  lr: 0.001017  loss: 2.9618 (3.0311)  time: 0.4585  data: 0.0002  max mem: 24440
[06:03:49.229071] Epoch: [54]  [5003/5004]  eta: 0:00:00  lr: 0.001016  loss: 3.0970 (3.0327)  time: 0.4529  data: 0.0006  max mem: 24440
[06:03:49.697102] Epoch: [54] Total time: 0:38:32 (0.4622 s / it)
[06:03:49.752746] Averaged stats: lr: 0.001016  loss: 3.0970 (3.0316)
[06:03:51.794618] Test:  [   0/1563]  eta: 0:53:05  loss: 0.5378 (0.5378)  acc1: 93.7500 (93.7500)  acc5: 93.7500 (93.7500)  time: 2.0380  data: 1.8294  max mem: 24440
[06:05:15.742998] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8326 (0.9811)  acc1: 78.1250 (74.6756)  acc5: 96.8750 (94.2241)  time: 0.1680  data: 0.0002  max mem: 24440
[06:06:39.710307] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4982 (1.1104)  acc1: 56.2500 (72.1747)  acc5: 90.6250 (92.2858)  time: 0.1677  data: 0.0002  max mem: 24440
[06:08:03.664084] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6741 (1.2053)  acc1: 84.3750 (70.3073)  acc5: 93.7500 (90.8311)  time: 0.1679  data: 0.0002  max mem: 24440
[06:08:14.058685] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6018 (1.2077)  acc1: 84.3750 (70.2860)  acc5: 93.7500 (90.7960)  time: 0.1670  data: 0.0001  max mem: 24440
[06:08:14.226986] Test: Total time: 0:04:24 (0.1692 s / it)
[06:08:14.867561] * Acc@1 70.288 Acc@5 90.795 loss 1.208
[06:08:14.867710] Accuracy of the network on the 50000 test images: 70.3%
[06:08:14.867730] Max accuracy: 70.29%
[06:08:15.019825] log_dir: ./output_dir_qkformer
[06:08:17.926988] Epoch: [55]  [   0/5004]  eta: 4:02:15  lr: 0.001016  loss: 3.1867 (3.1867)  time: 2.9048  data: 2.4393  max mem: 24440
[06:23:39.080488] Epoch: [55]  [2000/5004]  eta: 0:23:07  lr: 0.001013  loss: 3.0125 (3.0055)  time: 0.4638  data: 0.0002  max mem: 24440
[06:38:59.760733] Epoch: [55]  [4000/5004]  eta: 0:07:42  lr: 0.001010  loss: 2.9701 (3.0107)  time: 0.4598  data: 0.0003  max mem: 24440
[06:46:41.360578] Epoch: [55]  [5003/5004]  eta: 0:00:00  lr: 0.001009  loss: 2.9670 (3.0111)  time: 0.4580  data: 0.0005  max mem: 24440
[06:46:41.746624] Epoch: [55] Total time: 0:38:26 (0.4610 s / it)
[06:46:41.751844] Averaged stats: lr: 0.001009  loss: 2.9670 (3.0224)
[06:46:43.234517] Test:  [   0/1563]  eta: 0:38:32  loss: 0.5103 (0.5103)  acc1: 90.6250 (90.6250)  acc5: 93.7500 (93.7500)  time: 1.4792  data: 1.3033  max mem: 24440
[06:48:07.307444] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.8794 (1.0012)  acc1: 75.0000 (73.8211)  acc5: 96.8750 (94.4112)  time: 0.1678  data: 0.0002  max mem: 24440
[06:49:31.270435] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4470 (1.1460)  acc1: 59.3750 (71.3099)  acc5: 87.5000 (91.9705)  time: 0.1678  data: 0.0002  max mem: 24440
[06:50:55.224690] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6784 (1.2347)  acc1: 87.5000 (69.4808)  acc5: 96.8750 (90.6646)  time: 0.1678  data: 0.0002  max mem: 24440
[06:51:05.546902] Test:  [1562/1563]  eta: 0:00:00  loss: 0.9452 (1.2380)  acc1: 78.1250 (69.3940)  acc5: 96.8750 (90.6240)  time: 0.1635  data: 0.0001  max mem: 24440
[06:51:05.652530] Test: Total time: 0:04:23 (0.1688 s / it)
[06:51:06.259027] * Acc@1 69.385 Acc@5 90.627 loss 1.238
[06:51:06.259185] Accuracy of the network on the 50000 test images: 69.4%
[06:51:06.259206] Max accuracy: 70.29%
[06:51:06.352101] log_dir: ./output_dir_qkformer
[06:51:08.904521] Epoch: [56]  [   0/5004]  eta: 3:32:39  lr: 0.001009  loss: 3.5495 (3.5495)  time: 2.5498  data: 2.0741  max mem: 24440
[07:06:30.818137] Epoch: [56]  [2000/5004]  eta: 0:23:07  lr: 0.001006  loss: 2.9466 (3.0009)  time: 0.4603  data: 0.0003  max mem: 24440
[07:21:51.253813] Epoch: [56]  [4000/5004]  eta: 0:07:42  lr: 0.001003  loss: 2.8882 (3.0071)  time: 0.4622  data: 0.0002  max mem: 24440
[07:29:33.059676] Epoch: [56]  [5003/5004]  eta: 0:00:00  lr: 0.001002  loss: 2.9329 (3.0172)  time: 0.4529  data: 0.0005  max mem: 24440
[07:29:33.569623] Epoch: [56] Total time: 0:38:27 (0.4611 s / it)
[07:29:33.571683] Averaged stats: lr: 0.001002  loss: 2.9329 (3.0148)
[07:29:35.805436] Test:  [   0/1563]  eta: 0:58:00  loss: 0.4215 (0.4215)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.2267  data: 1.7985  max mem: 24440
[07:30:59.828770] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0505 (1.0032)  acc1: 75.0000 (74.9938)  acc5: 93.7500 (94.3550)  time: 0.1678  data: 0.0002  max mem: 24440
[07:32:23.859842] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4650 (1.1356)  acc1: 59.3750 (72.1310)  acc5: 90.6250 (92.2609)  time: 0.1679  data: 0.0002  max mem: 24440
[07:33:47.885430] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6529 (1.2121)  acc1: 84.3750 (70.6029)  acc5: 96.8750 (91.1059)  time: 0.1679  data: 0.0002  max mem: 24440
[07:33:58.209415] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4336 (1.2118)  acc1: 87.5000 (70.5820)  acc5: 100.0000 (91.1140)  time: 0.1635  data: 0.0001  max mem: 24440
[07:33:58.372539] Test: Total time: 0:04:24 (0.1694 s / it)
[07:33:58.731693] * Acc@1 70.592 Acc@5 91.115 loss 1.212
[07:33:58.731977] Accuracy of the network on the 50000 test images: 70.6%
[07:33:58.732037] Max accuracy: 70.59%
[07:33:58.790330] log_dir: ./output_dir_qkformer
[07:34:01.465143] Epoch: [57]  [   0/5004]  eta: 3:43:00  lr: 0.001002  loss: 3.2725 (3.2725)  time: 2.6741  data: 2.1896  max mem: 24440
[07:49:22.132155] Epoch: [57]  [2000/5004]  eta: 0:23:06  lr: 0.000999  loss: 3.0137 (3.0004)  time: 0.4595  data: 0.0002  max mem: 24440
[08:04:43.305901] Epoch: [57]  [4000/5004]  eta: 0:07:42  lr: 0.000996  loss: 3.1434 (3.0090)  time: 0.4575  data: 0.0002  max mem: 24440
[08:12:25.051730] Epoch: [57]  [5003/5004]  eta: 0:00:00  lr: 0.000994  loss: 2.9985 (3.0133)  time: 0.4541  data: 0.0008  max mem: 24440
[08:12:25.442592] Epoch: [57] Total time: 0:38:26 (0.4610 s / it)
[08:12:25.448532] Averaged stats: lr: 0.000994  loss: 2.9985 (3.0146)
[08:12:27.462397] Test:  [   0/1563]  eta: 0:52:16  loss: 0.4114 (0.4114)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.0068  data: 1.8162  max mem: 24440
[08:13:51.497834] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.1954 (0.9835)  acc1: 75.0000 (75.1060)  acc5: 90.6250 (94.3051)  time: 0.1679  data: 0.0002  max mem: 24440
[08:15:15.515009] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3666 (1.1193)  acc1: 59.3750 (72.4307)  acc5: 93.7500 (92.1641)  time: 0.1679  data: 0.0002  max mem: 24440
[08:16:39.582375] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6425 (1.2080)  acc1: 84.3750 (70.5092)  acc5: 93.7500 (90.8644)  time: 0.1684  data: 0.0002  max mem: 24440
[08:16:49.904283] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6283 (1.2084)  acc1: 87.5000 (70.4880)  acc5: 93.7500 (90.8540)  time: 0.1635  data: 0.0001  max mem: 24440
[08:16:50.025995] Test: Total time: 0:04:24 (0.1693 s / it)
[08:16:50.259443] * Acc@1 70.485 Acc@5 90.854 loss 1.208
[08:16:50.259735] Accuracy of the network on the 50000 test images: 70.5%
[08:16:50.259790] Max accuracy: 70.59%
[08:16:50.375440] log_dir: ./output_dir_qkformer
[08:16:53.168998] Epoch: [58]  [   0/5004]  eta: 3:52:51  lr: 0.000994  loss: 2.4916 (2.4916)  time: 2.7921  data: 2.3213  max mem: 24440
[08:32:13.640462] Epoch: [58]  [2000/5004]  eta: 0:23:05  lr: 0.000991  loss: 3.1201 (2.9980)  time: 0.4598  data: 0.0002  max mem: 24440
[08:47:35.079150] Epoch: [58]  [4000/5004]  eta: 0:07:42  lr: 0.000989  loss: 3.0139 (3.0033)  time: 0.4611  data: 0.0003  max mem: 24440
[08:55:15.915406] Epoch: [58]  [5003/5004]  eta: 0:00:00  lr: 0.000987  loss: 3.1539 (3.0058)  time: 0.4551  data: 0.0006  max mem: 24440
[08:55:16.348639] Epoch: [58] Total time: 0:38:25 (0.4608 s / it)
[08:55:16.371834] Averaged stats: lr: 0.000987  loss: 3.1539 (3.0074)
[08:55:18.458405] Test:  [   0/1563]  eta: 0:54:11  loss: 0.2826 (0.2826)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 2.0802  data: 1.8993  max mem: 24440
[08:56:42.496580] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0311 (0.9481)  acc1: 75.0000 (75.9731)  acc5: 93.7500 (94.5609)  time: 0.1683  data: 0.0002  max mem: 24440
[08:58:06.493520] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4974 (1.1126)  acc1: 59.3750 (72.7741)  acc5: 93.7500 (92.2858)  time: 0.1680  data: 0.0002  max mem: 24440
[08:59:30.533855] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5499 (1.2043)  acc1: 87.5000 (70.7487)  acc5: 93.7500 (90.9415)  time: 0.1681  data: 0.0002  max mem: 24440
[08:59:40.860827] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6283 (1.2042)  acc1: 87.5000 (70.7640)  acc5: 96.8750 (90.9400)  time: 0.1636  data: 0.0001  max mem: 24440
[08:59:40.985495] Test: Total time: 0:04:24 (0.1693 s / it)
[08:59:41.315583] * Acc@1 70.764 Acc@5 90.939 loss 1.204
[08:59:41.315776] Accuracy of the network on the 50000 test images: 70.8%
[08:59:41.315800] Max accuracy: 70.76%
[08:59:41.402919] log_dir: ./output_dir_qkformer
[08:59:44.132942] Epoch: [59]  [   0/5004]  eta: 3:47:36  lr: 0.000987  loss: 3.5897 (3.5897)  time: 2.7291  data: 2.0790  max mem: 24440
[09:15:05.640460] Epoch: [59]  [2000/5004]  eta: 0:23:07  lr: 0.000984  loss: 2.8785 (2.9983)  time: 0.4623  data: 0.0002  max mem: 24440
[09:30:26.697954] Epoch: [59]  [4000/5004]  eta: 0:07:43  lr: 0.000981  loss: 2.8861 (3.0022)  time: 0.4611  data: 0.0003  max mem: 24440
[09:38:08.406561] Epoch: [59]  [5003/5004]  eta: 0:00:00  lr: 0.000980  loss: 2.8834 (3.0036)  time: 0.4542  data: 0.0008  max mem: 24440
[09:38:08.829775] Epoch: [59] Total time: 0:38:27 (0.4611 s / it)
[09:38:08.847232] Averaged stats: lr: 0.000980  loss: 2.8834 (3.0043)
[09:38:10.798435] Test:  [   0/1563]  eta: 0:50:41  loss: 0.5188 (0.5188)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.9460  data: 1.7723  max mem: 24440
[09:39:34.848752] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0278 (0.9908)  acc1: 75.0000 (75.0062)  acc5: 93.7500 (94.1305)  time: 0.1681  data: 0.0002  max mem: 24440
[09:40:58.865638] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3918 (1.1323)  acc1: 65.6250 (72.3839)  acc5: 90.6250 (92.0954)  time: 0.1679  data: 0.0002  max mem: 24440
[09:42:22.906892] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7380 (1.2304)  acc1: 81.2500 (70.3781)  acc5: 96.8750 (90.7687)  time: 0.1683  data: 0.0005  max mem: 24440
[09:42:33.239621] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5251 (1.2294)  acc1: 90.6250 (70.4120)  acc5: 96.8750 (90.7920)  time: 0.1635  data: 0.0001  max mem: 24440
[09:42:33.364208] Test: Total time: 0:04:24 (0.1692 s / it)
[09:42:33.709728] * Acc@1 70.412 Acc@5 90.789 loss 1.229
[09:42:33.709885] Accuracy of the network on the 50000 test images: 70.4%
[09:42:33.709908] Max accuracy: 70.76%
[09:42:33.792517] log_dir: ./output_dir_qkformer
[09:42:36.908769] Epoch: [60]  [   0/5004]  eta: 4:19:50  lr: 0.000980  loss: 2.7400 (2.7400)  time: 3.1156  data: 2.3154  max mem: 24440
[09:57:58.339342] Epoch: [60]  [2000/5004]  eta: 0:23:07  lr: 0.000977  loss: 2.9218 (2.9945)  time: 0.4592  data: 0.0002  max mem: 24440
[10:13:20.045614] Epoch: [60]  [4000/5004]  eta: 0:07:43  lr: 0.000974  loss: 2.9760 (2.9993)  time: 0.4598  data: 0.0003  max mem: 24440
[10:21:02.027824] Epoch: [60]  [5003/5004]  eta: 0:00:00  lr: 0.000972  loss: 3.0206 (3.0020)  time: 0.4538  data: 0.0005  max mem: 24440
[10:21:02.514996] Epoch: [60] Total time: 0:38:28 (0.4614 s / it)
[10:21:02.526955] Averaged stats: lr: 0.000972  loss: 3.0206 (2.9960)
[10:21:04.806060] Test:  [   0/1563]  eta: 0:59:12  loss: 0.4357 (0.4357)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.2726  data: 2.0791  max mem: 24440
[10:22:28.840249] Test:  [ 500/1563]  eta: 0:03:03  loss: 1.1214 (1.0260)  acc1: 71.8750 (74.2390)  acc5: 93.7500 (93.7313)  time: 0.1685  data: 0.0002  max mem: 24440
[10:23:52.918224] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5472 (1.1443)  acc1: 59.3750 (71.8875)  acc5: 90.6250 (92.0298)  time: 0.1681  data: 0.0002  max mem: 24440
[10:25:16.979171] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7217 (1.2194)  acc1: 81.2500 (70.2698)  acc5: 93.7500 (90.8644)  time: 0.1679  data: 0.0002  max mem: 24440
[10:25:27.311451] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5224 (1.2178)  acc1: 87.5000 (70.3000)  acc5: 93.7500 (90.8600)  time: 0.1636  data: 0.0001  max mem: 24440
[10:25:27.425608] Test: Total time: 0:04:24 (0.1695 s / it)
[10:25:27.883147] * Acc@1 70.301 Acc@5 90.862 loss 1.218
[10:25:27.883308] Accuracy of the network on the 50000 test images: 70.3%
[10:25:27.883329] Max accuracy: 70.76%
[10:25:27.946456] log_dir: ./output_dir_qkformer
[10:25:30.716660] Epoch: [61]  [   0/5004]  eta: 3:50:52  lr: 0.000972  loss: 3.3457 (3.3457)  time: 2.7683  data: 2.1935  max mem: 24440
[10:40:51.753303] Epoch: [61]  [2000/5004]  eta: 0:23:06  lr: 0.000969  loss: 2.9996 (2.9965)  time: 0.4617  data: 0.0002  max mem: 24440
[10:56:12.187080] Epoch: [61]  [4000/5004]  eta: 0:07:42  lr: 0.000966  loss: 2.9233 (2.9978)  time: 0.4631  data: 0.0003  max mem: 24440
[11:03:53.137747] Epoch: [61]  [5003/5004]  eta: 0:00:00  lr: 0.000964  loss: 3.0470 (3.0002)  time: 0.4531  data: 0.0005  max mem: 24440
[11:03:53.522272] Epoch: [61] Total time: 0:38:25 (0.4607 s / it)
[11:03:53.575805] Averaged stats: lr: 0.000964  loss: 3.0470 (2.9928)
[11:03:55.628687] Test:  [   0/1563]  eta: 0:53:19  loss: 0.5319 (0.5319)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.0471  data: 1.8308  max mem: 24440
[11:05:19.832993] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7968 (0.9638)  acc1: 75.0000 (75.4304)  acc5: 93.7500 (94.5921)  time: 0.1680  data: 0.0002  max mem: 24440
[11:06:43.861711] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2346 (1.1066)  acc1: 62.5000 (72.4744)  acc5: 93.7500 (92.2671)  time: 0.1681  data: 0.0002  max mem: 24440
[11:08:07.887177] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6774 (1.1832)  acc1: 87.5000 (70.9381)  acc5: 96.8750 (91.1538)  time: 0.1679  data: 0.0002  max mem: 24440
[11:08:18.211134] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6169 (1.1850)  acc1: 87.5000 (70.9340)  acc5: 96.8750 (91.1280)  time: 0.1636  data: 0.0001  max mem: 24440
[11:08:18.341747] Test: Total time: 0:04:24 (0.1694 s / it)
[11:08:18.749643] * Acc@1 70.938 Acc@5 91.126 loss 1.185
[11:08:18.749792] Accuracy of the network on the 50000 test images: 70.9%
[11:08:18.749816] Max accuracy: 70.94%
[11:08:18.806159] log_dir: ./output_dir_qkformer
[11:08:21.481467] Epoch: [62]  [   0/5004]  eta: 3:43:00  lr: 0.000964  loss: 3.1164 (3.1164)  time: 2.6740  data: 2.1390  max mem: 24440
[11:23:44.042064] Epoch: [62]  [2000/5004]  eta: 0:23:08  lr: 0.000961  loss: 3.0961 (2.9901)  time: 0.4640  data: 0.0002  max mem: 24440
[11:39:04.599709] Epoch: [62]  [4000/5004]  eta: 0:07:43  lr: 0.000958  loss: 2.9637 (2.9906)  time: 0.4602  data: 0.0002  max mem: 24440
[11:46:46.176075] Epoch: [62]  [5003/5004]  eta: 0:00:00  lr: 0.000957  loss: 3.0974 (2.9920)  time: 0.4541  data: 0.0005  max mem: 24440
[11:46:46.608243] Epoch: [62] Total time: 0:38:27 (0.4612 s / it)
[11:46:46.611401] Averaged stats: lr: 0.000957  loss: 3.0974 (2.9856)
[11:46:48.573840] Test:  [   0/1563]  eta: 0:51:01  loss: 0.4725 (0.4725)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.9589  data: 1.7828  max mem: 24440
[11:48:12.531631] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9083 (0.9648)  acc1: 68.7500 (75.1123)  acc5: 93.7500 (94.2989)  time: 0.1679  data: 0.0002  max mem: 24440
[11:49:36.512627] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3443 (1.1113)  acc1: 62.5000 (72.3277)  acc5: 90.6250 (92.0829)  time: 0.1680  data: 0.0002  max mem: 24440
[11:51:00.478363] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5995 (1.1903)  acc1: 84.3750 (70.8777)  acc5: 96.8750 (90.8478)  time: 0.1679  data: 0.0002  max mem: 24440
[11:51:10.804217] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6542 (1.1915)  acc1: 87.5000 (70.8460)  acc5: 96.8750 (90.8600)  time: 0.1635  data: 0.0001  max mem: 24440
[11:51:10.927471] Test: Total time: 0:04:24 (0.1691 s / it)
[11:51:11.403760] * Acc@1 70.840 Acc@5 90.859 loss 1.192
[11:51:11.404023] Accuracy of the network on the 50000 test images: 70.8%
[11:51:11.404074] Max accuracy: 70.94%
[11:51:11.491404] log_dir: ./output_dir_qkformer
[11:51:14.329618] Epoch: [63]  [   0/5004]  eta: 3:56:23  lr: 0.000957  loss: 3.0889 (3.0889)  time: 2.8345  data: 2.3291  max mem: 24440
[12:06:34.719280] Epoch: [63]  [2000/5004]  eta: 0:23:05  lr: 0.000954  loss: 2.9211 (2.9740)  time: 0.4568  data: 0.0002  max mem: 24440
[12:21:54.932035] Epoch: [63]  [4000/5004]  eta: 0:07:42  lr: 0.000951  loss: 3.0500 (2.9684)  time: 0.4582  data: 0.0003  max mem: 24440
[12:29:35.965203] Epoch: [63]  [5003/5004]  eta: 0:00:00  lr: 0.000949  loss: 3.0198 (2.9720)  time: 0.4531  data: 0.0009  max mem: 24440
[12:29:36.370504] Epoch: [63] Total time: 0:38:24 (0.4606 s / it)
[12:29:36.371496] Averaged stats: lr: 0.000949  loss: 3.0198 (2.9821)
[12:29:38.002857] Test:  [   0/1563]  eta: 0:42:22  loss: 0.5104 (0.5104)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6268  data: 1.4506  max mem: 24440
[12:31:01.975661] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.9538 (0.9778)  acc1: 78.1250 (75.7859)  acc5: 93.7500 (94.5983)  time: 0.1680  data: 0.0002  max mem: 24440
[12:32:25.931022] Test:  [1000/1563]  eta: 0:01:35  loss: 1.5370 (1.1275)  acc1: 62.5000 (72.9458)  acc5: 90.6250 (92.1610)  time: 0.1678  data: 0.0002  max mem: 24440
[12:33:49.905361] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7525 (1.2136)  acc1: 84.3750 (70.9652)  acc5: 96.8750 (90.8249)  time: 0.1677  data: 0.0002  max mem: 24440
[12:34:00.228655] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6073 (1.2127)  acc1: 90.6250 (71.0480)  acc5: 96.8750 (90.8300)  time: 0.1635  data: 0.0001  max mem: 24440
[12:34:00.335673] Test: Total time: 0:04:23 (0.1689 s / it)
[12:34:01.151075] * Acc@1 71.053 Acc@5 90.830 loss 1.213
[12:34:01.151223] Accuracy of the network on the 50000 test images: 71.1%
[12:34:01.151243] Max accuracy: 71.05%
[12:34:01.269412] log_dir: ./output_dir_qkformer
[12:34:04.877008] Epoch: [64]  [   0/5004]  eta: 5:00:47  lr: 0.000949  loss: 2.9286 (2.9286)  time: 3.6065  data: 2.1294  max mem: 24440
[12:49:25.674606] Epoch: [64]  [2000/5004]  eta: 0:23:07  lr: 0.000946  loss: 2.9053 (2.9665)  time: 0.4624  data: 0.0002  max mem: 24440
[13:04:47.521831] Epoch: [64]  [4000/5004]  eta: 0:07:43  lr: 0.000943  loss: 2.8130 (2.9699)  time: 0.4584  data: 0.0002  max mem: 24440
[13:12:28.609619] Epoch: [64]  [5003/5004]  eta: 0:00:00  lr: 0.000941  loss: 3.0548 (2.9697)  time: 0.4543  data: 0.0009  max mem: 24440
[13:12:29.385378] Epoch: [64] Total time: 0:38:28 (0.4613 s / it)
[13:12:29.501863] Averaged stats: lr: 0.000941  loss: 3.0548 (2.9758)
[13:12:33.080767] Test:  [   0/1563]  eta: 1:33:05  loss: 0.5186 (0.5186)  acc1: 90.6250 (90.6250)  acc5: 93.7500 (93.7500)  time: 3.5734  data: 3.2515  max mem: 24440
[13:13:57.052945] Test:  [ 500/1563]  eta: 0:03:05  loss: 0.9917 (0.9479)  acc1: 71.8750 (75.7859)  acc5: 93.7500 (94.7917)  time: 0.1680  data: 0.0002  max mem: 24440
[13:15:21.025114] Test:  [1000/1563]  eta: 0:01:36  loss: 1.4300 (1.1047)  acc1: 59.3750 (72.6523)  acc5: 90.6250 (92.5418)  time: 0.1685  data: 0.0002  max mem: 24440
[13:16:45.027007] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6338 (1.1917)  acc1: 84.3750 (70.9735)  acc5: 96.8750 (91.3016)  time: 0.1681  data: 0.0002  max mem: 24440
[13:16:55.351762] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5559 (1.1905)  acc1: 87.5000 (70.9780)  acc5: 96.8750 (91.2960)  time: 0.1635  data: 0.0001  max mem: 24440
[13:16:55.574440] Test: Total time: 0:04:26 (0.1702 s / it)
[13:16:56.003332] * Acc@1 70.972 Acc@5 91.296 loss 1.191
[13:16:56.003499] Accuracy of the network on the 50000 test images: 71.0%
[13:16:56.003525] Max accuracy: 71.05%
[13:16:56.145904] log_dir: ./output_dir_qkformer
[13:17:03.852049] Epoch: [65]  [   0/5004]  eta: 10:41:48  lr: 0.000941  loss: 2.8467 (2.8467)  time: 7.6956  data: 2.3580  max mem: 24440
[13:32:26.143201] Epoch: [65]  [2000/5004]  eta: 0:23:16  lr: 0.000938  loss: 2.7523 (2.9619)  time: 0.4632  data: 0.0003  max mem: 24440
[13:47:49.451875] Epoch: [65]  [4000/5004]  eta: 0:07:45  lr: 0.000935  loss: 2.9902 (2.9667)  time: 0.4607  data: 0.0002  max mem: 24440
[13:55:31.818770] Epoch: [65]  [5003/5004]  eta: 0:00:00  lr: 0.000933  loss: 2.9900 (2.9694)  time: 0.4557  data: 0.0011  max mem: 24440
[13:55:32.445139] Epoch: [65] Total time: 0:38:36 (0.4629 s / it)
[13:55:32.494033] Averaged stats: lr: 0.000933  loss: 2.9900 (2.9712)
[13:55:36.029833] Test:  [   0/1563]  eta: 1:31:56  loss: 0.3576 (0.3576)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 3.5293  data: 3.1602  max mem: 24440
[13:57:00.024038] Test:  [ 500/1563]  eta: 0:03:05  loss: 0.7519 (0.9691)  acc1: 81.2500 (75.0686)  acc5: 96.8750 (94.5796)  time: 0.1678  data: 0.0002  max mem: 24440
[13:58:24.007439] Test:  [1000/1563]  eta: 0:01:36  loss: 1.2983 (1.1047)  acc1: 62.5000 (72.4026)  acc5: 90.6250 (92.4700)  time: 0.1680  data: 0.0002  max mem: 24440
[13:59:47.981749] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7307 (1.1893)  acc1: 84.3750 (70.6425)  acc5: 93.7500 (91.1288)  time: 0.1678  data: 0.0002  max mem: 24440
[13:59:58.301856] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6401 (1.1892)  acc1: 87.5000 (70.6880)  acc5: 96.8750 (91.1580)  time: 0.1635  data: 0.0001  max mem: 24440
[13:59:58.418729] Test: Total time: 0:04:25 (0.1701 s / it)
[13:59:58.942962] * Acc@1 70.692 Acc@5 91.158 loss 1.189
[13:59:58.943112] Accuracy of the network on the 50000 test images: 70.7%
[13:59:58.943133] Max accuracy: 71.05%
[13:59:58.991196] log_dir: ./output_dir_qkformer
[14:00:01.667357] Epoch: [66]  [   0/5004]  eta: 3:42:56  lr: 0.000933  loss: 2.5489 (2.5489)  time: 2.6731  data: 2.1261  max mem: 24440
[14:15:23.621656] Epoch: [66]  [2000/5004]  eta: 0:23:07  lr: 0.000930  loss: 3.0710 (2.9666)  time: 0.4619  data: 0.0003  max mem: 24440
[14:30:45.189868] Epoch: [66]  [4000/5004]  eta: 0:07:43  lr: 0.000927  loss: 3.0156 (2.9714)  time: 0.4599  data: 0.0002  max mem: 24440
[14:38:27.677789] Epoch: [66]  [5003/5004]  eta: 0:00:00  lr: 0.000925  loss: 2.7930 (2.9724)  time: 0.4547  data: 0.0005  max mem: 24440
[14:38:28.147484] Epoch: [66] Total time: 0:38:29 (0.4615 s / it)
[14:38:28.148407] Averaged stats: lr: 0.000925  loss: 2.7930 (2.9624)
[14:38:30.178134] Test:  [   0/1563]  eta: 0:52:45  loss: 0.3661 (0.3661)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.0255  data: 1.8527  max mem: 24440
[14:39:54.154220] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8777 (0.9174)  acc1: 75.0000 (75.9543)  acc5: 96.8750 (94.6919)  time: 0.1678  data: 0.0002  max mem: 24440
[14:41:18.117846] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2005 (1.0357)  acc1: 65.6250 (73.6045)  acc5: 90.6250 (93.0694)  time: 0.1678  data: 0.0002  max mem: 24440
[14:42:42.066048] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5983 (1.1278)  acc1: 84.3750 (71.8396)  acc5: 96.8750 (91.7305)  time: 0.1678  data: 0.0002  max mem: 24440
[14:42:52.380177] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5429 (1.1328)  acc1: 87.5000 (71.7280)  acc5: 96.8750 (91.6920)  time: 0.1634  data: 0.0001  max mem: 24440
[14:42:52.520754] Test: Total time: 0:04:24 (0.1691 s / it)
[14:42:53.115528] * Acc@1 71.726 Acc@5 91.694 loss 1.133
[14:42:53.115726] Accuracy of the network on the 50000 test images: 71.7%
[14:42:53.115751] Max accuracy: 71.73%
[14:42:53.217965] log_dir: ./output_dir_qkformer
[14:42:55.996102] Epoch: [67]  [   0/5004]  eta: 3:51:36  lr: 0.000925  loss: 3.3769 (3.3769)  time: 2.7771  data: 2.2564  max mem: 24440
[14:58:17.982142] Epoch: [67]  [2000/5004]  eta: 0:23:07  lr: 0.000922  loss: 3.0511 (2.9514)  time: 0.4612  data: 0.0002  max mem: 24440
[15:13:38.993798] Epoch: [67]  [4000/5004]  eta: 0:07:43  lr: 0.000918  loss: 3.0021 (2.9648)  time: 0.4589  data: 0.0002  max mem: 24440
[15:21:20.658282] Epoch: [67]  [5003/5004]  eta: 0:00:00  lr: 0.000917  loss: 2.9662 (2.9701)  time: 0.4542  data: 0.0005  max mem: 24440
[15:21:21.059913] Epoch: [67] Total time: 0:38:27 (0.4612 s / it)
[15:21:21.174159] Averaged stats: lr: 0.000917  loss: 2.9662 (2.9617)
[15:21:23.936122] Test:  [   0/1563]  eta: 1:11:46  loss: 0.2329 (0.2329)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.7551  data: 2.1498  max mem: 24440
[15:22:47.963275] Test:  [ 500/1563]  eta: 0:03:04  loss: 1.0075 (0.9097)  acc1: 71.8750 (76.7465)  acc5: 93.7500 (95.3156)  time: 0.1680  data: 0.0002  max mem: 24440
[15:24:11.964656] Test:  [1000/1563]  eta: 0:01:36  loss: 1.3944 (1.0791)  acc1: 56.2500 (73.2018)  acc5: 90.6250 (92.8478)  time: 0.1683  data: 0.0002  max mem: 24440
[15:25:35.907795] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6996 (1.1723)  acc1: 84.3750 (71.4982)  acc5: 96.8750 (91.4661)  time: 0.1679  data: 0.0002  max mem: 24440
[15:25:46.241907] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4497 (1.1732)  acc1: 90.6250 (71.5040)  acc5: 96.8750 (91.4400)  time: 0.1635  data: 0.0001  max mem: 24440
[15:25:46.366392] Test: Total time: 0:04:25 (0.1697 s / it)
[15:25:46.804292] * Acc@1 71.502 Acc@5 91.440 loss 1.173
[15:25:46.804447] Accuracy of the network on the 50000 test images: 71.5%
[15:25:46.804467] Max accuracy: 71.73%
[15:25:46.890045] log_dir: ./output_dir_qkformer
[15:25:50.025377] Epoch: [68]  [   0/5004]  eta: 4:21:25  lr: 0.000917  loss: 2.6236 (2.6236)  time: 3.1347  data: 2.4081  max mem: 24440
[15:41:11.634953] Epoch: [68]  [2000/5004]  eta: 0:23:07  lr: 0.000914  loss: 2.8829 (2.9286)  time: 0.4601  data: 0.0002  max mem: 24440
[15:56:31.722098] Epoch: [68]  [4000/5004]  eta: 0:07:42  lr: 0.000910  loss: 3.0188 (2.9393)  time: 0.4617  data: 0.0002  max mem: 24440
[16:04:13.304321] Epoch: [68]  [5003/5004]  eta: 0:00:00  lr: 0.000909  loss: 3.0195 (2.9428)  time: 0.4532  data: 0.0009  max mem: 24440
[16:04:13.775031] Epoch: [68] Total time: 0:38:26 (0.4610 s / it)
[16:04:13.796072] Averaged stats: lr: 0.000909  loss: 3.0195 (2.9542)
[16:04:15.980250] Test:  [   0/1563]  eta: 0:56:38  loss: 0.2696 (0.2696)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.1745  data: 1.7662  max mem: 24440
[16:05:39.898030] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0291 (0.9427)  acc1: 75.0000 (76.1352)  acc5: 96.8750 (94.6981)  time: 0.1679  data: 0.0002  max mem: 24440
[16:07:03.870203] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3593 (1.0876)  acc1: 68.7500 (73.2705)  acc5: 90.6250 (92.4326)  time: 0.1680  data: 0.0002  max mem: 24440
[16:08:27.806416] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6711 (1.1676)  acc1: 84.3750 (71.8209)  acc5: 96.8750 (91.3183)  time: 0.1680  data: 0.0002  max mem: 24440
[16:08:38.133481] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5581 (1.1685)  acc1: 87.5000 (71.8020)  acc5: 96.8750 (91.3340)  time: 0.1634  data: 0.0001  max mem: 24440
[16:08:38.250567] Test: Total time: 0:04:24 (0.1692 s / it)
[16:08:38.852408] * Acc@1 71.798 Acc@5 91.335 loss 1.169
[16:08:38.852581] Accuracy of the network on the 50000 test images: 71.8%
[16:08:38.852608] Max accuracy: 71.80%
[16:08:38.977724] log_dir: ./output_dir_qkformer
[16:08:41.906344] Epoch: [69]  [   0/5004]  eta: 4:04:06  lr: 0.000909  loss: 2.4894 (2.4894)  time: 2.9270  data: 2.4622  max mem: 24440
[16:24:04.270704] Epoch: [69]  [2000/5004]  eta: 0:23:08  lr: 0.000905  loss: 2.8234 (2.9400)  time: 0.4634  data: 0.0002  max mem: 24440
[16:39:25.788519] Epoch: [69]  [4000/5004]  eta: 0:07:43  lr: 0.000902  loss: 3.0394 (2.9525)  time: 0.4617  data: 0.0002  max mem: 24440
[16:47:07.318984] Epoch: [69]  [5003/5004]  eta: 0:00:00  lr: 0.000900  loss: 2.9554 (2.9504)  time: 0.4542  data: 0.0008  max mem: 24440
[16:47:07.802212] Epoch: [69] Total time: 0:38:28 (0.4614 s / it)
[16:47:07.803410] Averaged stats: lr: 0.000900  loss: 2.9554 (2.9487)
[16:47:10.355752] Test:  [   0/1563]  eta: 1:06:19  loss: 0.4340 (0.4340)  acc1: 87.5000 (87.5000)  acc5: 100.0000 (100.0000)  time: 2.5462  data: 2.2158  max mem: 24440
[16:48:34.383376] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.8533 (0.9519)  acc1: 75.0000 (75.5302)  acc5: 93.7500 (94.4798)  time: 0.1679  data: 0.0002  max mem: 24440
[16:49:58.432073] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2277 (1.0822)  acc1: 56.2500 (73.0488)  acc5: 93.7500 (92.4825)  time: 0.1681  data: 0.0002  max mem: 24440
[16:51:22.430164] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7204 (1.1629)  acc1: 81.2500 (71.2150)  acc5: 93.7500 (91.4286)  time: 0.1683  data: 0.0002  max mem: 24440
[16:51:32.758366] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5235 (1.1667)  acc1: 87.5000 (71.1720)  acc5: 96.8750 (91.4060)  time: 0.1636  data: 0.0001  max mem: 24440
[16:51:32.864769] Test: Total time: 0:04:25 (0.1696 s / it)
[16:51:33.216449] * Acc@1 71.177 Acc@5 91.407 loss 1.167
[16:51:33.216661] Accuracy of the network on the 50000 test images: 71.2%
[16:51:33.216696] Max accuracy: 71.80%
[16:51:33.366715] log_dir: ./output_dir_qkformer
[16:51:36.047477] Epoch: [70]  [   0/5004]  eta: 3:43:30  lr: 0.000900  loss: 2.8541 (2.8541)  time: 2.6800  data: 2.1943  max mem: 24440
[17:06:57.002335] Epoch: [70]  [2000/5004]  eta: 0:23:06  lr: 0.000897  loss: 2.8068 (2.9345)  time: 0.4598  data: 0.0002  max mem: 24440
[17:22:17.881021] Epoch: [70]  [4000/5004]  eta: 0:07:42  lr: 0.000894  loss: 2.7377 (2.9429)  time: 0.4616  data: 0.0002  max mem: 24440
[17:29:59.179638] Epoch: [70]  [5003/5004]  eta: 0:00:00  lr: 0.000892  loss: 2.8247 (2.9447)  time: 0.4571  data: 0.0005  max mem: 24440
[17:29:59.590920] Epoch: [70] Total time: 0:38:26 (0.4609 s / it)
[17:29:59.592077] Averaged stats: lr: 0.000892  loss: 2.8247 (2.9432)
[17:30:01.172210] Test:  [   0/1563]  eta: 0:41:02  loss: 0.6727 (0.6727)  acc1: 87.5000 (87.5000)  acc5: 93.7500 (93.7500)  time: 1.5758  data: 1.3993  max mem: 24440
[17:31:25.182145] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.8913 (0.9318)  acc1: 81.2500 (76.2163)  acc5: 93.7500 (94.9663)  time: 0.1679  data: 0.0002  max mem: 24440
[17:32:49.177855] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0989 (1.0792)  acc1: 78.1250 (73.1862)  acc5: 93.7500 (92.7167)  time: 0.1678  data: 0.0002  max mem: 24440
[17:34:13.188298] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6546 (1.1621)  acc1: 84.3750 (71.5814)  acc5: 96.8750 (91.5744)  time: 0.1681  data: 0.0002  max mem: 24440
[17:34:23.529283] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6855 (1.1654)  acc1: 84.3750 (71.4820)  acc5: 96.8750 (91.5520)  time: 0.1638  data: 0.0001  max mem: 24440
[17:34:23.659971] Test: Total time: 0:04:24 (0.1689 s / it)
[17:34:24.063233] * Acc@1 71.485 Acc@5 91.555 loss 1.165
[17:34:24.063379] Accuracy of the network on the 50000 test images: 71.5%
[17:34:24.063404] Max accuracy: 71.80%
[17:34:24.181209] log_dir: ./output_dir_qkformer
[17:34:26.945833] Epoch: [71]  [   0/5004]  eta: 3:50:24  lr: 0.000892  loss: 2.9969 (2.9969)  time: 2.7627  data: 2.2532  max mem: 24440
[17:49:48.663057] Epoch: [71]  [2000/5004]  eta: 0:23:07  lr: 0.000888  loss: 2.8864 (2.9333)  time: 0.4617  data: 0.0002  max mem: 24440
[18:05:09.265667] Epoch: [71]  [4000/5004]  eta: 0:07:42  lr: 0.000885  loss: 2.8602 (2.9385)  time: 0.4631  data: 0.0003  max mem: 24440
[18:12:50.364290] Epoch: [71]  [5003/5004]  eta: 0:00:00  lr: 0.000883  loss: 2.8600 (2.9416)  time: 0.4542  data: 0.0008  max mem: 24440
[18:12:50.753797] Epoch: [71] Total time: 0:38:26 (0.4609 s / it)
[18:12:50.762087] Averaged stats: lr: 0.000883  loss: 2.8600 (2.9385)
[18:12:52.626550] Test:  [   0/1563]  eta: 0:48:27  loss: 0.1922 (0.1922)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 1.8603  data: 1.3572  max mem: 24440
[18:14:16.723657] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8708 (0.9394)  acc1: 78.1250 (75.9918)  acc5: 96.8750 (94.8915)  time: 0.1679  data: 0.0002  max mem: 24440
[18:15:40.750941] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0618 (1.0880)  acc1: 65.6250 (73.2674)  acc5: 90.6250 (92.6823)  time: 0.1678  data: 0.0002  max mem: 24440
[18:17:04.749016] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6302 (1.1723)  acc1: 81.2500 (71.4919)  acc5: 93.7500 (91.4495)  time: 0.1678  data: 0.0002  max mem: 24440
[18:17:15.076238] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5837 (1.1678)  acc1: 87.5000 (71.5720)  acc5: 96.8750 (91.5020)  time: 0.1635  data: 0.0001  max mem: 24440
[18:17:15.196606] Test: Total time: 0:04:24 (0.1692 s / it)
[18:17:15.638448] * Acc@1 71.572 Acc@5 91.497 loss 1.168
[18:17:15.638600] Accuracy of the network on the 50000 test images: 71.6%
[18:17:15.638620] Max accuracy: 71.80%
[18:17:15.747215] log_dir: ./output_dir_qkformer
[18:17:18.596311] Epoch: [72]  [   0/5004]  eta: 3:57:23  lr: 0.000883  loss: 3.6969 (3.6969)  time: 2.8465  data: 2.3770  max mem: 24440
[18:32:38.568477] Epoch: [72]  [2000/5004]  eta: 0:23:05  lr: 0.000880  loss: 2.8312 (2.9168)  time: 0.4597  data: 0.0002  max mem: 24440
[18:47:58.498788] Epoch: [72]  [4000/5004]  eta: 0:07:42  lr: 0.000877  loss: 2.7546 (2.9303)  time: 0.4557  data: 0.0002  max mem: 24440
[18:55:39.730343] Epoch: [72]  [5003/5004]  eta: 0:00:00  lr: 0.000875  loss: 3.0326 (2.9319)  time: 0.4541  data: 0.0007  max mem: 24440
[18:55:40.261157] Epoch: [72] Total time: 0:38:24 (0.4605 s / it)
[18:55:40.262223] Averaged stats: lr: 0.000875  loss: 3.0326 (2.9326)
[18:55:41.829725] Test:  [   0/1563]  eta: 0:40:41  loss: 0.3712 (0.3712)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.5620  data: 1.3686  max mem: 24440
[18:57:05.833468] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0189 (0.8951)  acc1: 68.7500 (77.1457)  acc5: 93.7500 (95.2720)  time: 0.1683  data: 0.0005  max mem: 24440
[18:58:29.830266] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1825 (1.0465)  acc1: 62.5000 (73.9791)  acc5: 90.6250 (93.1537)  time: 0.1678  data: 0.0002  max mem: 24440
[18:59:53.786805] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7256 (1.1289)  acc1: 84.3750 (72.3955)  acc5: 96.8750 (92.0157)  time: 0.1679  data: 0.0002  max mem: 24440
[19:00:04.107986] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4573 (1.1289)  acc1: 87.5000 (72.3960)  acc5: 96.8750 (91.9980)  time: 0.1635  data: 0.0001  max mem: 24440
[19:00:04.217335] Test: Total time: 0:04:23 (0.1689 s / it)
[19:00:04.884289] * Acc@1 72.397 Acc@5 91.998 loss 1.129
[19:00:04.884446] Accuracy of the network on the 50000 test images: 72.4%
[19:00:04.884467] Max accuracy: 72.40%
[19:00:05.013190] log_dir: ./output_dir_qkformer
[19:00:07.768315] Epoch: [73]  [   0/5004]  eta: 3:49:34  lr: 0.000875  loss: 3.0341 (3.0341)  time: 2.7528  data: 2.1596  max mem: 24440
[19:15:27.833442] Epoch: [73]  [2000/5004]  eta: 0:23:05  lr: 0.000871  loss: 2.8912 (2.9222)  time: 0.4576  data: 0.0002  max mem: 24440
[19:30:49.091965] Epoch: [73]  [4000/5004]  eta: 0:07:42  lr: 0.000868  loss: 3.0372 (2.9253)  time: 0.4607  data: 0.0003  max mem: 24440
[19:38:30.549039] Epoch: [73]  [5003/5004]  eta: 0:00:00  lr: 0.000866  loss: 2.8497 (2.9253)  time: 0.4535  data: 0.0005  max mem: 24440
[19:38:30.968502] Epoch: [73] Total time: 0:38:25 (0.4608 s / it)
[19:38:30.969794] Averaged stats: lr: 0.000866  loss: 2.8497 (2.9288)
[19:38:32.833599] Test:  [   0/1563]  eta: 0:48:21  loss: 0.5419 (0.5419)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.8565  data: 1.6321  max mem: 24440
[19:39:56.856677] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7874 (0.8823)  acc1: 75.0000 (76.9212)  acc5: 96.8750 (95.1472)  time: 0.1681  data: 0.0002  max mem: 24440
[19:41:20.880367] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1966 (1.0261)  acc1: 59.3750 (74.0697)  acc5: 90.6250 (92.9758)  time: 0.1679  data: 0.0002  max mem: 24440
[19:42:44.853057] Test:  [1500/1563]  eta: 0:00:10  loss: 0.8033 (1.1204)  acc1: 84.3750 (72.3518)  acc5: 93.7500 (91.6951)  time: 0.1679  data: 0.0002  max mem: 24440
[19:42:55.562186] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5785 (1.1210)  acc1: 87.5000 (72.3640)  acc5: 96.8750 (91.6840)  time: 0.1824  data: 0.0001  max mem: 24440
[19:42:55.789880] Test: Total time: 0:04:24 (0.1694 s / it)
[19:42:55.791867] * Acc@1 72.372 Acc@5 91.689 loss 1.121
[19:42:55.792108] Accuracy of the network on the 50000 test images: 72.4%
[19:42:55.792150] Max accuracy: 72.40%
[19:42:55.921694] log_dir: ./output_dir_qkformer
[19:42:58.550628] Epoch: [74]  [   0/5004]  eta: 3:39:10  lr: 0.000866  loss: 2.9063 (2.9063)  time: 2.6279  data: 2.1503  max mem: 24440
[19:58:19.864206] Epoch: [74]  [2000/5004]  eta: 0:23:06  lr: 0.000863  loss: 2.8574 (2.9161)  time: 0.4586  data: 0.0002  max mem: 24440
[20:13:39.854251] Epoch: [74]  [4000/5004]  eta: 0:07:42  lr: 0.000859  loss: 2.9079 (2.9226)  time: 0.4594  data: 0.0002  max mem: 24440
[20:21:21.146808] Epoch: [74]  [5003/5004]  eta: 0:00:00  lr: 0.000858  loss: 2.9055 (2.9233)  time: 0.4540  data: 0.0007  max mem: 24440
[20:21:21.584535] Epoch: [74] Total time: 0:38:25 (0.4608 s / it)
[20:21:21.643905] Averaged stats: lr: 0.000858  loss: 2.9055 (2.9237)
[20:21:23.212946] Test:  [   0/1563]  eta: 0:40:46  loss: 0.5497 (0.5497)  acc1: 93.7500 (93.7500)  acc5: 93.7500 (93.7500)  time: 1.5655  data: 1.3810  max mem: 24440
[20:22:47.666345] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0742 (0.9539)  acc1: 75.0000 (75.7547)  acc5: 96.8750 (94.6732)  time: 0.1682  data: 0.0002  max mem: 24440
[20:24:11.718955] Test:  [1000/1563]  eta: 0:01:35  loss: 1.4344 (1.0961)  acc1: 62.5000 (72.7616)  acc5: 90.6250 (92.5356)  time: 0.1682  data: 0.0002  max mem: 24440
[20:25:35.737995] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6492 (1.1681)  acc1: 81.2500 (71.2733)  acc5: 96.8750 (91.4203)  time: 0.1680  data: 0.0002  max mem: 24440
[20:25:46.069168] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5731 (1.1651)  acc1: 87.5000 (71.3460)  acc5: 96.8750 (91.4480)  time: 0.1636  data: 0.0001  max mem: 24440
[20:25:46.191322] Test: Total time: 0:04:24 (0.1693 s / it)
[20:25:46.193444] * Acc@1 71.339 Acc@5 91.450 loss 1.165
[20:25:46.193756] Accuracy of the network on the 50000 test images: 71.3%
[20:25:46.193808] Max accuracy: 72.40%
[20:25:46.278542] log_dir: ./output_dir_qkformer
[20:25:49.115975] Epoch: [75]  [   0/5004]  eta: 3:56:18  lr: 0.000858  loss: 3.3076 (3.3076)  time: 2.8334  data: 2.0847  max mem: 24440
[20:41:09.541061] Epoch: [75]  [2000/5004]  eta: 0:23:05  lr: 0.000854  loss: 2.8899 (2.9078)  time: 0.4600  data: 0.0002  max mem: 24440
[20:56:29.621079] Epoch: [75]  [4000/5004]  eta: 0:07:42  lr: 0.000851  loss: 2.8684 (2.9087)  time: 0.4586  data: 0.0002  max mem: 24440
[21:04:11.126153] Epoch: [75]  [5003/5004]  eta: 0:00:00  lr: 0.000849  loss: 2.7274 (2.9131)  time: 0.4592  data: 0.0005  max mem: 24440
[21:04:11.445894] Epoch: [75] Total time: 0:38:25 (0.4607 s / it)
[21:04:11.636178] Averaged stats: lr: 0.000849  loss: 2.7274 (2.9207)
[21:04:13.588285] Test:  [   0/1563]  eta: 0:50:45  loss: 0.6455 (0.6455)  acc1: 84.3750 (84.3750)  acc5: 96.8750 (96.8750)  time: 1.9482  data: 1.7725  max mem: 24440
[21:05:37.656258] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.0327 (0.9303)  acc1: 71.8750 (76.2475)  acc5: 96.8750 (94.7917)  time: 0.1679  data: 0.0002  max mem: 24440
[21:07:01.679827] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2920 (1.0499)  acc1: 71.8750 (73.7013)  acc5: 90.6250 (92.9289)  time: 0.1679  data: 0.0002  max mem: 24440
[21:08:25.706397] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7357 (1.1346)  acc1: 81.2500 (72.0561)  acc5: 96.8750 (91.5577)  time: 0.1678  data: 0.0002  max mem: 24440
[21:08:36.028824] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3773 (1.1331)  acc1: 90.6250 (72.0760)  acc5: 96.8750 (91.5660)  time: 0.1635  data: 0.0001  max mem: 24440
[21:08:36.149402] Test: Total time: 0:04:24 (0.1692 s / it)
[21:08:36.529070] * Acc@1 72.075 Acc@5 91.567 loss 1.133
[21:08:36.529339] Accuracy of the network on the 50000 test images: 72.1%
[21:08:36.529391] Max accuracy: 72.40%
[21:08:36.601796] log_dir: ./output_dir_qkformer
[21:08:39.229352] Epoch: [76]  [   0/5004]  eta: 3:38:56  lr: 0.000849  loss: 3.4113 (3.4113)  time: 2.6251  data: 2.1252  max mem: 24440
[21:24:00.262360] Epoch: [76]  [2000/5004]  eta: 0:23:06  lr: 0.000845  loss: 2.9501 (2.9122)  time: 0.4604  data: 0.0003  max mem: 24440
[21:39:21.713615] Epoch: [76]  [4000/5004]  eta: 0:07:42  lr: 0.000842  loss: 2.8096 (2.9196)  time: 0.4587  data: 0.0002  max mem: 24440
[21:47:02.585836] Epoch: [76]  [5003/5004]  eta: 0:00:00  lr: 0.000840  loss: 2.8788 (2.9189)  time: 0.4586  data: 0.0009  max mem: 24440
[21:47:03.041820] Epoch: [76] Total time: 0:38:26 (0.4609 s / it)
[21:47:03.045925] Averaged stats: lr: 0.000840  loss: 2.8788 (2.9140)
[21:47:05.158176] Test:  [   0/1563]  eta: 0:54:55  loss: 0.2074 (0.2074)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 2.1084  data: 1.9268  max mem: 24440
[21:48:29.115510] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8857 (0.8977)  acc1: 78.1250 (77.3827)  acc5: 93.7500 (95.2408)  time: 0.1678  data: 0.0002  max mem: 24440
[21:49:53.077398] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1882 (1.0423)  acc1: 71.8750 (74.5848)  acc5: 90.6250 (93.1412)  time: 0.1679  data: 0.0002  max mem: 24440
[21:51:17.017975] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7346 (1.1254)  acc1: 84.3750 (72.7369)  acc5: 93.7500 (91.9658)  time: 0.1677  data: 0.0002  max mem: 24440
[21:51:27.336595] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4613 (1.1256)  acc1: 87.5000 (72.7380)  acc5: 93.7500 (91.9680)  time: 0.1635  data: 0.0001  max mem: 24440
[21:51:27.445971] Test: Total time: 0:04:24 (0.1692 s / it)
[21:51:28.094209] * Acc@1 72.730 Acc@5 91.973 loss 1.125
[21:51:28.094371] Accuracy of the network on the 50000 test images: 72.7%
[21:51:28.094393] Max accuracy: 72.73%
[21:51:28.176931] log_dir: ./output_dir_qkformer
[21:51:30.860017] Epoch: [77]  [   0/5004]  eta: 3:43:24  lr: 0.000840  loss: 2.4049 (2.4049)  time: 2.6787  data: 2.1659  max mem: 24440
[22:06:52.364646] Epoch: [77]  [2000/5004]  eta: 0:23:07  lr: 0.000836  loss: 2.9376 (2.9014)  time: 0.4598  data: 0.0003  max mem: 24440
[22:22:13.441722] Epoch: [77]  [4000/5004]  eta: 0:07:42  lr: 0.000833  loss: 3.0121 (2.9011)  time: 0.4588  data: 0.0002  max mem: 24440
[22:29:55.471008] Epoch: [77]  [5003/5004]  eta: 0:00:00  lr: 0.000831  loss: 2.9063 (2.9101)  time: 0.4551  data: 0.0009  max mem: 24440
[22:29:55.949663] Epoch: [77] Total time: 0:38:27 (0.4612 s / it)
[22:29:55.951614] Averaged stats: lr: 0.000831  loss: 2.9063 (2.9112)
[22:29:57.668759] Test:  [   0/1563]  eta: 0:44:33  loss: 0.4497 (0.4497)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.7104  data: 1.5359  max mem: 24440
[22:31:21.663551] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.8806 (0.9409)  acc1: 78.1250 (75.5926)  acc5: 93.7500 (94.9538)  time: 0.1680  data: 0.0002  max mem: 24440
[22:32:45.621497] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1301 (1.0428)  acc1: 68.7500 (73.8636)  acc5: 90.6250 (93.2349)  time: 0.1677  data: 0.0002  max mem: 24440
[22:34:09.570099] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6186 (1.1131)  acc1: 84.3750 (72.4663)  acc5: 96.8750 (92.1698)  time: 0.1679  data: 0.0002  max mem: 24440
[22:34:19.900654] Test:  [1562/1563]  eta: 0:00:00  loss: 0.7073 (1.1154)  acc1: 84.3750 (72.4500)  acc5: 96.8750 (92.1460)  time: 0.1635  data: 0.0001  max mem: 24440
[22:34:20.004087] Test: Total time: 0:04:24 (0.1689 s / it)
[22:34:20.661514] * Acc@1 72.445 Acc@5 92.147 loss 1.115
[22:34:20.661710] Accuracy of the network on the 50000 test images: 72.4%
[22:34:20.661733] Max accuracy: 72.73%
[22:34:20.754694] log_dir: ./output_dir_qkformer
[22:34:23.293362] Epoch: [78]  [   0/5004]  eta: 3:31:36  lr: 0.000831  loss: 3.3987 (3.3987)  time: 2.5373  data: 2.0282  max mem: 24440
[22:49:45.075226] Epoch: [78]  [2000/5004]  eta: 0:23:07  lr: 0.000827  loss: 2.8710 (2.9019)  time: 0.4661  data: 0.0002  max mem: 24440
[23:05:05.451058] Epoch: [78]  [4000/5004]  eta: 0:07:42  lr: 0.000824  loss: 2.9737 (2.9055)  time: 0.4587  data: 0.0003  max mem: 24440
[23:12:46.829970] Epoch: [78]  [5003/5004]  eta: 0:00:00  lr: 0.000822  loss: 2.9364 (2.9040)  time: 0.4531  data: 0.0005  max mem: 24440
[23:12:47.198647] Epoch: [78] Total time: 0:38:26 (0.4609 s / it)
[23:12:47.247926] Averaged stats: lr: 0.000822  loss: 2.9364 (2.9030)
[23:12:49.133883] Test:  [   0/1563]  eta: 0:49:02  loss: 0.1713 (0.1713)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 1.8827  data: 1.7104  max mem: 24440
[23:14:13.128573] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8441 (0.8680)  acc1: 78.1250 (77.8069)  acc5: 93.7500 (95.3343)  time: 0.1679  data: 0.0002  max mem: 24440
[23:15:37.130263] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2095 (1.0166)  acc1: 62.5000 (74.6379)  acc5: 90.6250 (93.5034)  time: 0.1681  data: 0.0002  max mem: 24440
[23:17:01.157830] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6075 (1.0867)  acc1: 87.5000 (73.1471)  acc5: 96.8750 (92.5383)  time: 0.1680  data: 0.0002  max mem: 24440
[23:17:11.492872] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5922 (1.0883)  acc1: 84.3750 (73.0960)  acc5: 96.8750 (92.5080)  time: 0.1636  data: 0.0001  max mem: 24440
[23:17:11.617033] Test: Total time: 0:04:24 (0.1691 s / it)
[23:17:12.163261] * Acc@1 73.094 Acc@5 92.505 loss 1.088
[23:17:12.163407] Accuracy of the network on the 50000 test images: 73.1%
[23:17:12.163427] Max accuracy: 73.09%
[23:17:12.240546] log_dir: ./output_dir_qkformer
[23:17:14.934558] Epoch: [79]  [   0/5004]  eta: 3:44:25  lr: 0.000822  loss: 3.0395 (3.0395)  time: 2.6909  data: 2.1893  max mem: 24440
[23:32:36.738836] Epoch: [79]  [2000/5004]  eta: 0:23:07  lr: 0.000818  loss: 2.7482 (2.8923)  time: 0.4604  data: 0.0002  max mem: 24440
[23:47:57.550048] Epoch: [79]  [4000/5004]  eta: 0:07:43  lr: 0.000815  loss: 2.8235 (2.8936)  time: 0.4583  data: 0.0002  max mem: 24440
[23:55:38.927432] Epoch: [79]  [5003/5004]  eta: 0:00:00  lr: 0.000813  loss: 2.7839 (2.8957)  time: 0.4569  data: 0.0005  max mem: 24440
[23:55:39.519193] Epoch: [79] Total time: 0:38:27 (0.4611 s / it)
[23:55:39.542386] Averaged stats: lr: 0.000813  loss: 2.7839 (2.8988)
[23:55:41.789332] Test:  [   0/1563]  eta: 0:58:20  loss: 0.1939 (0.1939)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 2.2396  data: 1.8973  max mem: 24440
[23:57:05.776465] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8083 (0.8979)  acc1: 78.1250 (77.4326)  acc5: 96.8750 (95.1659)  time: 0.1678  data: 0.0002  max mem: 24440
[23:58:29.797511] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1802 (1.0202)  acc1: 65.6250 (74.7222)  acc5: 90.6250 (93.3535)  time: 0.1681  data: 0.0002  max mem: 24440
[23:59:53.793836] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7212 (1.0968)  acc1: 84.3750 (72.9743)  acc5: 96.8750 (92.1448)  time: 0.1679  data: 0.0002  max mem: 24440
[00:00:04.121113] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5323 (1.0986)  acc1: 87.5000 (72.9300)  acc5: 96.8750 (92.1180)  time: 0.1635  data: 0.0001  max mem: 24440
[00:00:04.254464] Test: Total time: 0:04:24 (0.1694 s / it)
[00:00:04.665945] * Acc@1 72.931 Acc@5 92.121 loss 1.098
[00:00:04.666093] Accuracy of the network on the 50000 test images: 72.9%
[00:00:04.666114] Max accuracy: 73.09%
[00:00:04.722781] log_dir: ./output_dir_qkformer
[00:00:07.922075] Epoch: [80]  [   0/5004]  eta: 4:26:26  lr: 0.000813  loss: 2.2387 (2.2387)  time: 3.1948  data: 2.7234  max mem: 24440
[00:15:31.116372] Epoch: [80]  [2000/5004]  eta: 0:23:10  lr: 0.000809  loss: 2.7066 (2.8670)  time: 0.4606  data: 0.0002  max mem: 24440
[00:30:52.769411] Epoch: [80]  [4000/5004]  eta: 0:07:43  lr: 0.000806  loss: 2.8716 (2.8778)  time: 0.4583  data: 0.0002  max mem: 24440
[00:38:34.859334] Epoch: [80]  [5003/5004]  eta: 0:00:00  lr: 0.000804  loss: 2.9389 (2.8813)  time: 0.4588  data: 0.0009  max mem: 24440
[00:38:35.190337] Epoch: [80] Total time: 0:38:30 (0.4617 s / it)
[00:38:35.372185] Averaged stats: lr: 0.000804  loss: 2.9389 (2.8923)
[00:38:37.850727] Test:  [   0/1563]  eta: 1:04:27  loss: 0.2477 (0.2477)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 2.4744  data: 2.0843  max mem: 24440
[00:40:01.847212] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.9334 (0.8980)  acc1: 75.0000 (77.7320)  acc5: 96.8750 (95.4029)  time: 0.1678  data: 0.0002  max mem: 24440
[00:41:25.880627] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2825 (1.0234)  acc1: 65.6250 (74.7908)  acc5: 93.7500 (93.5190)  time: 0.1680  data: 0.0002  max mem: 24440
[00:42:49.866228] Test:  [1500/1563]  eta: 0:00:10  loss: 0.7636 (1.1013)  acc1: 81.2500 (73.1700)  acc5: 96.8750 (92.3176)  time: 0.1678  data: 0.0002  max mem: 24440
[00:43:00.189320] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5047 (1.1036)  acc1: 87.5000 (73.1280)  acc5: 100.0000 (92.3040)  time: 0.1635  data: 0.0001  max mem: 24440
[00:43:00.319306] Test: Total time: 0:04:24 (0.1695 s / it)
[00:43:00.723009] * Acc@1 73.132 Acc@5 92.307 loss 1.104
[00:43:00.723212] Accuracy of the network on the 50000 test images: 73.1%
[00:43:00.723238] Max accuracy: 73.13%
[00:43:00.800506] log_dir: ./output_dir_qkformer
[00:43:03.578455] Epoch: [81]  [   0/5004]  eta: 3:51:33  lr: 0.000804  loss: 2.9095 (2.9095)  time: 2.7765  data: 2.2033  max mem: 24440
[00:58:25.273053] Epoch: [81]  [2000/5004]  eta: 0:23:07  lr: 0.000800  loss: 2.7636 (2.8890)  time: 0.4584  data: 0.0002  max mem: 24440
[01:13:45.985173] Epoch: [81]  [4000/5004]  eta: 0:07:42  lr: 0.000797  loss: 2.8076 (2.8897)  time: 0.4568  data: 0.0003  max mem: 24440
[01:21:27.164425] Epoch: [81]  [5003/5004]  eta: 0:00:00  lr: 0.000795  loss: 2.8146 (2.8906)  time: 0.4542  data: 0.0009  max mem: 24440
[01:21:27.601046] Epoch: [81] Total time: 0:38:26 (0.4610 s / it)
[01:21:27.603620] Averaged stats: lr: 0.000795  loss: 2.8146 (2.8856)
[01:21:30.163831] Test:  [   0/1563]  eta: 1:06:34  loss: 0.3006 (0.3006)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.5555  data: 2.3079  max mem: 24440
[01:22:54.314205] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.8884 (0.8449)  acc1: 75.0000 (77.9878)  acc5: 96.8750 (95.2470)  time: 0.1680  data: 0.0002  max mem: 24440
[01:24:18.369041] Test:  [1000/1563]  eta: 0:01:36  loss: 1.2789 (0.9791)  acc1: 68.7500 (75.2029)  acc5: 93.7500 (93.4846)  time: 0.1678  data: 0.0002  max mem: 24440
[01:25:42.375477] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6726 (1.0652)  acc1: 87.5000 (73.4406)  acc5: 96.8750 (92.4300)  time: 0.1678  data: 0.0002  max mem: 24440
[01:25:52.705903] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5110 (1.0699)  acc1: 87.5000 (73.2920)  acc5: 96.8750 (92.3900)  time: 0.1636  data: 0.0001  max mem: 24440
[01:25:52.829200] Test: Total time: 0:04:25 (0.1697 s / it)
[01:25:53.172884] * Acc@1 73.291 Acc@5 92.391 loss 1.070
[01:25:53.173064] Accuracy of the network on the 50000 test images: 73.3%
[01:25:53.173086] Max accuracy: 73.29%
[01:25:53.248501] log_dir: ./output_dir_qkformer
[01:25:55.835377] Epoch: [82]  [   0/5004]  eta: 3:35:35  lr: 0.000795  loss: 2.9688 (2.9688)  time: 2.5851  data: 2.0721  max mem: 24440
[01:41:16.331865] Epoch: [82]  [2000/5004]  eta: 0:23:05  lr: 0.000791  loss: 2.9012 (2.8686)  time: 0.4635  data: 0.0002  max mem: 24440
[01:56:36.580677] Epoch: [82]  [4000/5004]  eta: 0:07:42  lr: 0.000788  loss: 2.8439 (2.8768)  time: 0.4564  data: 0.0002  max mem: 24440
[02:04:17.835375] Epoch: [82]  [5003/5004]  eta: 0:00:00  lr: 0.000786  loss: 2.9007 (2.8798)  time: 0.4576  data: 0.0006  max mem: 24440
[02:04:18.253077] Epoch: [82] Total time: 0:38:25 (0.4606 s / it)
[02:04:18.301718] Averaged stats: lr: 0.000786  loss: 2.9007 (2.8812)
[02:04:20.113285] Test:  [   0/1563]  eta: 0:47:04  loss: 0.3578 (0.3578)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 1.8074  data: 1.6153  max mem: 24440
[02:05:44.149992] Test:  [ 500/1563]  eta: 0:03:02  loss: 1.1194 (0.8732)  acc1: 71.8750 (77.8194)  acc5: 96.8750 (95.3530)  time: 0.1679  data: 0.0002  max mem: 24440
[02:07:08.160364] Test:  [1000/1563]  eta: 0:01:35  loss: 1.6349 (1.0105)  acc1: 56.2500 (74.9376)  acc5: 90.6250 (93.5502)  time: 0.1679  data: 0.0002  max mem: 24440
[02:08:32.150013] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6659 (1.0944)  acc1: 84.3750 (73.1908)  acc5: 96.8750 (92.3947)  time: 0.1679  data: 0.0002  max mem: 24440
[02:08:42.472723] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4779 (1.0960)  acc1: 90.6250 (73.1420)  acc5: 96.8750 (92.3760)  time: 0.1636  data: 0.0001  max mem: 24440
[02:08:42.577185] Test: Total time: 0:04:24 (0.1691 s / it)
[02:08:42.961517] * Acc@1 73.140 Acc@5 92.379 loss 1.096
[02:08:42.961661] Accuracy of the network on the 50000 test images: 73.1%
[02:08:42.961686] Max accuracy: 73.29%
[02:08:43.027139] log_dir: ./output_dir_qkformer
[02:08:45.624606] Epoch: [83]  [   0/5004]  eta: 3:36:34  lr: 0.000786  loss: 2.6402 (2.6402)  time: 2.5968  data: 2.1350  max mem: 24440
[02:24:06.938794] Epoch: [83]  [2000/5004]  eta: 0:23:06  lr: 0.000782  loss: 2.6467 (2.8765)  time: 0.4579  data: 0.0002  max mem: 24440
[02:39:28.488990] Epoch: [83]  [4000/5004]  eta: 0:07:43  lr: 0.000778  loss: 2.9314 (2.8839)  time: 0.4637  data: 0.0002  max mem: 24440
[02:47:09.787301] Epoch: [83]  [5003/5004]  eta: 0:00:00  lr: 0.000777  loss: 2.8910 (2.8829)  time: 0.4543  data: 0.0009  max mem: 24440
[02:47:10.171580] Epoch: [83] Total time: 0:38:27 (0.4611 s / it)
[02:47:10.224668] Averaged stats: lr: 0.000777  loss: 2.8910 (2.8753)
[02:47:12.416184] Test:  [   0/1563]  eta: 0:56:58  loss: 0.3368 (0.3368)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1869  data: 1.5619  max mem: 24440
[02:48:36.386846] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7999 (0.8452)  acc1: 81.2500 (78.1500)  acc5: 96.8750 (95.5215)  time: 0.1678  data: 0.0002  max mem: 24440
[02:50:00.366248] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9556 (0.9914)  acc1: 68.7500 (75.0187)  acc5: 93.7500 (93.6189)  time: 0.1678  data: 0.0002  max mem: 24440
[02:51:24.350986] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5896 (1.0707)  acc1: 84.3750 (73.2761)  acc5: 96.8750 (92.4717)  time: 0.1683  data: 0.0002  max mem: 24440
[02:51:34.667407] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4274 (1.0680)  acc1: 87.5000 (73.3380)  acc5: 100.0000 (92.5280)  time: 0.1635  data: 0.0001  max mem: 24440
[02:51:34.786213] Test: Total time: 0:04:24 (0.1693 s / it)
[02:51:35.389246] * Acc@1 73.341 Acc@5 92.529 loss 1.068
[02:51:35.389396] Accuracy of the network on the 50000 test images: 73.3%
[02:51:35.389417] Max accuracy: 73.34%
[02:51:35.478252] log_dir: ./output_dir_qkformer
[02:51:38.328093] Epoch: [84]  [   0/5004]  eta: 3:57:31  lr: 0.000777  loss: 3.0308 (3.0308)  time: 2.8480  data: 2.3867  max mem: 24440
[03:06:58.974121] Epoch: [84]  [2000/5004]  eta: 0:23:06  lr: 0.000773  loss: 2.7185 (2.8497)  time: 0.4657  data: 0.0002  max mem: 24440
[03:22:19.239929] Epoch: [84]  [4000/5004]  eta: 0:07:42  lr: 0.000769  loss: 2.7573 (2.8661)  time: 0.4618  data: 0.0002  max mem: 24440
[03:30:01.170347] Epoch: [84]  [5003/5004]  eta: 0:00:00  lr: 0.000767  loss: 2.6647 (2.8672)  time: 0.4576  data: 0.0006  max mem: 24440
[03:30:01.602067] Epoch: [84] Total time: 0:38:26 (0.4609 s / it)
[03:30:01.631854] Averaged stats: lr: 0.000767  loss: 2.6647 (2.8715)
[03:30:03.272942] Test:  [   0/1563]  eta: 0:42:35  loss: 0.3329 (0.3329)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 1.6352  data: 1.4591  max mem: 24440
[03:31:27.261567] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7800 (0.8523)  acc1: 78.1250 (77.5886)  acc5: 96.8750 (95.6088)  time: 0.1680  data: 0.0002  max mem: 24440
[03:32:51.400634] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1747 (0.9858)  acc1: 62.5000 (74.6660)  acc5: 90.6250 (93.5877)  time: 0.1677  data: 0.0002  max mem: 24440
[03:34:15.353211] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5603 (1.0709)  acc1: 90.6250 (72.8223)  acc5: 100.0000 (92.4696)  time: 0.1679  data: 0.0004  max mem: 24440
[03:34:25.671728] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6101 (1.0704)  acc1: 87.5000 (72.8480)  acc5: 96.8750 (92.4880)  time: 0.1634  data: 0.0001  max mem: 24440
[03:34:25.788052] Test: Total time: 0:04:24 (0.1690 s / it)
[03:34:26.431863] * Acc@1 72.851 Acc@5 92.489 loss 1.070
[03:34:26.432014] Accuracy of the network on the 50000 test images: 72.9%
[03:34:26.432037] Max accuracy: 73.34%
[03:34:26.481672] log_dir: ./output_dir_qkformer
[03:34:29.206586] Epoch: [85]  [   0/5004]  eta: 3:47:06  lr: 0.000767  loss: 2.7791 (2.7791)  time: 2.7231  data: 2.1251  max mem: 24440
[03:49:50.042097] Epoch: [85]  [2000/5004]  eta: 0:23:06  lr: 0.000764  loss: 2.7855 (2.8465)  time: 0.4572  data: 0.0002  max mem: 24440
[04:05:08.662619] Epoch: [85]  [4000/5004]  eta: 0:07:42  lr: 0.000760  loss: 2.8327 (2.8566)  time: 0.4563  data: 0.0002  max mem: 24440
[04:12:49.096997] Epoch: [85]  [5003/5004]  eta: 0:00:00  lr: 0.000758  loss: 2.6939 (2.8584)  time: 0.4532  data: 0.0010  max mem: 24440
[04:12:49.605275] Epoch: [85] Total time: 0:38:23 (0.4603 s / it)
[04:12:49.606262] Averaged stats: lr: 0.000758  loss: 2.6939 (2.8666)
[04:12:51.107828] Test:  [   0/1563]  eta: 0:39:00  loss: 0.3561 (0.3561)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.4975  data: 1.3111  max mem: 24440
[04:14:15.536729] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.9172 (0.8738)  acc1: 75.0000 (77.8630)  acc5: 96.8750 (95.2844)  time: 0.1685  data: 0.0002  max mem: 24440
[04:15:39.505213] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1773 (1.0197)  acc1: 75.0000 (74.6129)  acc5: 90.6250 (93.1069)  time: 0.1679  data: 0.0004  max mem: 24440
[04:17:03.462931] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6409 (1.0851)  acc1: 87.5000 (73.1200)  acc5: 96.8750 (92.2489)  time: 0.1679  data: 0.0002  max mem: 24440
[04:17:13.781426] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5668 (1.0864)  acc1: 90.6250 (73.1100)  acc5: 96.8750 (92.2220)  time: 0.1634  data: 0.0001  max mem: 24440
[04:17:13.906909] Test: Total time: 0:04:24 (0.1691 s / it)
[04:17:14.128659] * Acc@1 73.110 Acc@5 92.220 loss 1.086
[04:17:14.128818] Accuracy of the network on the 50000 test images: 73.1%
[04:17:14.128843] Max accuracy: 73.34%
[04:17:14.204918] log_dir: ./output_dir_qkformer
[04:17:17.035332] Epoch: [86]  [   0/5004]  eta: 3:56:00  lr: 0.000758  loss: 2.7603 (2.7603)  time: 2.8298  data: 2.2218  max mem: 24440
[04:32:37.188761] Epoch: [86]  [2000/5004]  eta: 0:23:05  lr: 0.000754  loss: 2.7963 (2.8643)  time: 0.4563  data: 0.0002  max mem: 24440
[04:47:57.609874] Epoch: [86]  [4000/5004]  eta: 0:07:42  lr: 0.000751  loss: 2.8615 (2.8654)  time: 0.4608  data: 0.0003  max mem: 24440
[04:55:39.134806] Epoch: [86]  [5003/5004]  eta: 0:00:00  lr: 0.000749  loss: 2.7991 (2.8688)  time: 0.4532  data: 0.0006  max mem: 24440
[04:55:39.611181] Epoch: [86] Total time: 0:38:25 (0.4607 s / it)
[04:55:39.615664] Averaged stats: lr: 0.000749  loss: 2.7991 (2.8614)
[04:55:41.612551] Test:  [   0/1563]  eta: 0:51:51  loss: 0.1122 (0.1122)  acc1: 100.0000 (100.0000)  acc5: 100.0000 (100.0000)  time: 1.9907  data: 1.8155  max mem: 24440
[04:57:05.578542] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7809 (0.8867)  acc1: 78.1250 (77.3578)  acc5: 93.7500 (95.1035)  time: 0.1680  data: 0.0002  max mem: 24440
[04:58:29.519619] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3076 (1.0226)  acc1: 65.6250 (74.8033)  acc5: 93.7500 (92.9914)  time: 0.1677  data: 0.0002  max mem: 24440
[04:59:53.481501] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5026 (1.1070)  acc1: 87.5000 (72.8723)  acc5: 96.8750 (91.9512)  time: 0.1677  data: 0.0002  max mem: 24440
[05:00:03.815689] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6006 (1.1089)  acc1: 87.5000 (72.8220)  acc5: 96.8750 (91.9560)  time: 0.1635  data: 0.0001  max mem: 24440
[05:00:03.902488] Test: Total time: 0:04:24 (0.1691 s / it)
[05:00:04.514455] * Acc@1 72.819 Acc@5 91.953 loss 1.109
[05:00:04.514605] Accuracy of the network on the 50000 test images: 72.8%
[05:00:04.514626] Max accuracy: 73.34%
[05:00:04.555595] log_dir: ./output_dir_qkformer
[05:00:07.263828] Epoch: [87]  [   0/5004]  eta: 3:45:44  lr: 0.000749  loss: 2.4929 (2.4929)  time: 2.7067  data: 2.2023  max mem: 24440
[05:15:28.159461] Epoch: [87]  [2000/5004]  eta: 0:23:06  lr: 0.000745  loss: 2.7942 (2.8457)  time: 0.4625  data: 0.0002  max mem: 24440
[05:30:48.735487] Epoch: [87]  [4000/5004]  eta: 0:07:42  lr: 0.000741  loss: 2.6483 (2.8391)  time: 0.4591  data: 0.0002  max mem: 24440
[05:38:30.920766] Epoch: [87]  [5003/5004]  eta: 0:00:00  lr: 0.000739  loss: 2.7743 (2.8404)  time: 0.4538  data: 0.0009  max mem: 24440
[05:38:31.365769] Epoch: [87] Total time: 0:38:26 (0.4610 s / it)
[05:38:31.371224] Averaged stats: lr: 0.000739  loss: 2.7743 (2.8537)
[05:38:33.449913] Test:  [   0/1563]  eta: 0:54:03  loss: 0.3336 (0.3336)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0753  data: 1.9004  max mem: 24440
[05:39:57.497000] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8706 (0.8611)  acc1: 78.1250 (78.6178)  acc5: 96.8750 (95.5464)  time: 0.1679  data: 0.0002  max mem: 24440
[05:41:21.517141] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1206 (1.0032)  acc1: 68.7500 (75.2935)  acc5: 93.7500 (93.5252)  time: 0.1680  data: 0.0002  max mem: 24440
[05:42:45.826797] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6275 (1.0627)  acc1: 87.5000 (73.9111)  acc5: 96.8750 (92.7715)  time: 0.1679  data: 0.0002  max mem: 24440
[05:42:56.156203] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6083 (1.0664)  acc1: 84.3750 (73.8460)  acc5: 96.8750 (92.7260)  time: 0.1636  data: 0.0001  max mem: 24440
[05:42:56.288549] Test: Total time: 0:04:24 (0.1695 s / it)
[05:42:56.754783] * Acc@1 73.842 Acc@5 92.728 loss 1.066
[05:42:56.754973] Accuracy of the network on the 50000 test images: 73.8%
[05:42:56.754997] Max accuracy: 73.84%
[05:42:56.834573] log_dir: ./output_dir_qkformer
[05:42:59.500421] Epoch: [88]  [   0/5004]  eta: 3:41:44  lr: 0.000739  loss: 2.6515 (2.6515)  time: 2.6588  data: 2.0664  max mem: 24440
[05:58:19.610148] Epoch: [88]  [2000/5004]  eta: 0:23:05  lr: 0.000736  loss: 2.8569 (2.8460)  time: 0.4586  data: 0.0002  max mem: 24440
[06:13:39.398292] Epoch: [88]  [4000/5004]  eta: 0:07:42  lr: 0.000732  loss: 2.7264 (2.8510)  time: 0.4613  data: 0.0003  max mem: 24440
[06:21:20.563895] Epoch: [88]  [5003/5004]  eta: 0:00:00  lr: 0.000730  loss: 2.6536 (2.8512)  time: 0.4534  data: 0.0009  max mem: 24440
[06:21:20.972464] Epoch: [88] Total time: 0:38:24 (0.4605 s / it)
[06:21:20.974490] Averaged stats: lr: 0.000730  loss: 2.6536 (2.8483)
[06:21:22.672565] Test:  [   0/1563]  eta: 0:44:06  loss: 0.4456 (0.4456)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.6934  data: 1.5178  max mem: 24440
[06:22:46.641314] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0091 (0.8438)  acc1: 71.8750 (78.0751)  acc5: 93.7500 (95.4716)  time: 0.1678  data: 0.0002  max mem: 24440
[06:24:10.640235] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9229 (0.9725)  acc1: 75.0000 (75.5963)  acc5: 93.7500 (93.7313)  time: 0.1680  data: 0.0002  max mem: 24440
[06:25:34.663517] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5585 (1.0495)  acc1: 84.3750 (73.8945)  acc5: 96.8750 (92.6487)  time: 0.1679  data: 0.0002  max mem: 24440
[06:25:44.993142] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4724 (1.0504)  acc1: 87.5000 (73.8640)  acc5: 96.8750 (92.6360)  time: 0.1635  data: 0.0001  max mem: 24440
[06:25:45.122617] Test: Total time: 0:04:24 (0.1690 s / it)
[06:25:45.419440] * Acc@1 73.865 Acc@5 92.635 loss 1.050
[06:25:45.419600] Accuracy of the network on the 50000 test images: 73.9%
[06:25:45.419624] Max accuracy: 73.87%
[06:25:45.449920] log_dir: ./output_dir_qkformer
[06:25:51.532681] Epoch: [89]  [   0/5004]  eta: 8:27:12  lr: 0.000730  loss: 2.9180 (2.9180)  time: 6.0817  data: 1.9402  max mem: 24440
[06:41:13.049010] Epoch: [89]  [2000/5004]  eta: 0:23:12  lr: 0.000726  loss: 2.7962 (2.8396)  time: 0.4581  data: 0.0003  max mem: 24440
[06:56:33.209638] Epoch: [89]  [4000/5004]  eta: 0:07:43  lr: 0.000722  loss: 2.7492 (2.8466)  time: 0.4613  data: 0.0003  max mem: 24440
[07:04:14.351624] Epoch: [89]  [5003/5004]  eta: 0:00:00  lr: 0.000720  loss: 2.7299 (2.8471)  time: 0.4535  data: 0.0009  max mem: 24440
[07:04:14.804234] Epoch: [89] Total time: 0:38:29 (0.4615 s / it)
[07:04:14.805622] Averaged stats: lr: 0.000720  loss: 2.7299 (2.8424)
[07:04:16.293669] Test:  [   0/1563]  eta: 0:38:33  loss: 0.2774 (0.2774)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 1.4801  data: 1.3024  max mem: 24440
[07:05:40.234867] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.8525 (0.8535)  acc1: 78.1250 (78.0501)  acc5: 96.8750 (95.5464)  time: 0.1678  data: 0.0002  max mem: 24440
[07:07:04.216096] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9610 (0.9835)  acc1: 71.8750 (75.2841)  acc5: 96.8750 (93.5502)  time: 0.1679  data: 0.0002  max mem: 24440
[07:08:28.204752] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4348 (1.0653)  acc1: 87.5000 (73.4385)  acc5: 96.8750 (92.4009)  time: 0.1677  data: 0.0002  max mem: 24440
[07:08:38.635512] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3862 (1.0615)  acc1: 87.5000 (73.5380)  acc5: 100.0000 (92.4620)  time: 0.1690  data: 0.0001  max mem: 24440
[07:08:38.803588] Test: Total time: 0:04:23 (0.1689 s / it)
[07:08:39.287838] * Acc@1 73.537 Acc@5 92.462 loss 1.061
[07:08:39.288006] Accuracy of the network on the 50000 test images: 73.5%
[07:08:39.288034] Max accuracy: 73.87%
[07:08:39.352192] log_dir: ./output_dir_qkformer
[07:08:41.918924] Epoch: [90]  [   0/5004]  eta: 3:33:59  lr: 0.000720  loss: 2.9747 (2.9747)  time: 2.5659  data: 2.0535  max mem: 24440
[07:24:02.248319] Epoch: [90]  [2000/5004]  eta: 0:23:05  lr: 0.000717  loss: 2.8396 (2.8191)  time: 0.4591  data: 0.0002  max mem: 24440
[07:39:22.089395] Epoch: [90]  [4000/5004]  eta: 0:07:42  lr: 0.000713  loss: 2.8416 (2.8252)  time: 0.4613  data: 0.0003  max mem: 24440
[07:47:03.488273] Epoch: [90]  [5003/5004]  eta: 0:00:00  lr: 0.000711  loss: 2.8273 (2.8334)  time: 0.4528  data: 0.0006  max mem: 24440
[07:47:03.866328] Epoch: [90] Total time: 0:38:24 (0.4605 s / it)
[07:47:03.887968] Averaged stats: lr: 0.000711  loss: 2.8273 (2.8337)
[07:47:05.758600] Test:  [   0/1563]  eta: 0:48:34  loss: 0.1968 (0.1968)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 1.8649  data: 1.6820  max mem: 24440
[07:48:29.763757] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8601 (0.8523)  acc1: 75.0000 (78.2435)  acc5: 96.8750 (95.9581)  time: 0.1680  data: 0.0002  max mem: 24440
[07:49:53.735868] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2657 (0.9826)  acc1: 65.6250 (75.5807)  acc5: 93.7500 (94.0060)  time: 0.1681  data: 0.0002  max mem: 24440
[07:51:17.751573] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4474 (1.0689)  acc1: 87.5000 (73.7342)  acc5: 96.8750 (92.6799)  time: 0.1678  data: 0.0002  max mem: 24440
[07:51:28.079940] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6093 (1.0701)  acc1: 87.5000 (73.7060)  acc5: 96.8750 (92.6840)  time: 0.1635  data: 0.0001  max mem: 24440
[07:51:28.216324] Test: Total time: 0:04:24 (0.1691 s / it)
[07:51:28.659657] * Acc@1 73.704 Acc@5 92.683 loss 1.070
[07:51:28.659809] Accuracy of the network on the 50000 test images: 73.7%
[07:51:28.659831] Max accuracy: 73.87%
[07:51:28.719676] log_dir: ./output_dir_qkformer
[07:51:31.531801] Epoch: [91]  [   0/5004]  eta: 3:54:19  lr: 0.000711  loss: 3.3117 (3.3117)  time: 2.8096  data: 1.8370  max mem: 24440
[08:06:52.277295] Epoch: [91]  [2000/5004]  eta: 0:23:06  lr: 0.000707  loss: 2.7876 (2.8190)  time: 0.4633  data: 0.0002  max mem: 24440
[08:22:11.291557] Epoch: [91]  [4000/5004]  eta: 0:07:42  lr: 0.000703  loss: 2.8352 (2.8301)  time: 0.4560  data: 0.0002  max mem: 24440
[08:29:52.249837] Epoch: [91]  [5003/5004]  eta: 0:00:00  lr: 0.000701  loss: 2.9345 (2.8338)  time: 0.4525  data: 0.0009  max mem: 24440
[08:29:52.635946] Epoch: [91] Total time: 0:38:23 (0.4604 s / it)
[08:29:52.638184] Averaged stats: lr: 0.000701  loss: 2.9345 (2.8294)
[08:29:54.117210] Test:  [   0/1563]  eta: 0:38:21  loss: 0.3146 (0.3146)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.4726  data: 1.2979  max mem: 24440
[08:31:18.073299] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.9302 (0.8378)  acc1: 75.0000 (77.6634)  acc5: 96.8750 (95.6025)  time: 0.1678  data: 0.0002  max mem: 24440
[08:32:42.050792] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1860 (0.9673)  acc1: 68.7500 (75.1186)  acc5: 93.7500 (93.8374)  time: 0.1679  data: 0.0002  max mem: 24440
[08:34:06.028448] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5361 (1.0459)  acc1: 84.3750 (73.6447)  acc5: 96.8750 (92.6757)  time: 0.1680  data: 0.0002  max mem: 24440
[08:34:16.350054] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4474 (1.0454)  acc1: 90.6250 (73.6980)  acc5: 100.0000 (92.7020)  time: 0.1635  data: 0.0001  max mem: 24440
[08:34:16.460960] Test: Total time: 0:04:23 (0.1688 s / it)
[08:34:17.245676] * Acc@1 73.705 Acc@5 92.703 loss 1.045
[08:34:17.245824] Accuracy of the network on the 50000 test images: 73.7%
[08:34:17.245845] Max accuracy: 73.87%
[08:34:17.333279] log_dir: ./output_dir_qkformer
[08:34:20.162749] Epoch: [92]  [   0/5004]  eta: 3:55:54  lr: 0.000701  loss: 2.5412 (2.5412)  time: 2.8286  data: 2.3780  max mem: 24440
[08:49:41.611398] Epoch: [92]  [2000/5004]  eta: 0:23:07  lr: 0.000698  loss: 2.8327 (2.8175)  time: 0.4624  data: 0.0003  max mem: 24440
[09:05:03.948297] Epoch: [92]  [4000/5004]  eta: 0:07:43  lr: 0.000694  loss: 2.9859 (2.8334)  time: 0.4614  data: 0.0002  max mem: 24440
[09:12:44.985017] Epoch: [92]  [5003/5004]  eta: 0:00:00  lr: 0.000692  loss: 3.0166 (2.8320)  time: 0.4527  data: 0.0009  max mem: 24440
[09:12:45.494367] Epoch: [92] Total time: 0:38:28 (0.4613 s / it)
[09:12:45.495432] Averaged stats: lr: 0.000692  loss: 3.0166 (2.8244)
[09:12:47.463053] Test:  [   0/1563]  eta: 0:51:01  loss: 0.2388 (0.2388)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.9588  data: 1.6899  max mem: 24440
[09:14:11.478384] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8096 (0.8500)  acc1: 78.1250 (78.2685)  acc5: 96.8750 (95.3842)  time: 0.1680  data: 0.0002  max mem: 24440
[09:15:35.475831] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2555 (0.9724)  acc1: 65.6250 (75.8866)  acc5: 93.7500 (93.7375)  time: 0.1682  data: 0.0002  max mem: 24440
[09:16:59.451393] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4045 (1.0368)  acc1: 87.5000 (74.3713)  acc5: 96.8750 (92.8506)  time: 0.1679  data: 0.0002  max mem: 24440
[09:17:09.776445] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4076 (1.0384)  acc1: 87.5000 (74.3400)  acc5: 96.8750 (92.8220)  time: 0.1635  data: 0.0001  max mem: 24440
[09:17:09.892804] Test: Total time: 0:04:24 (0.1692 s / it)
[09:17:10.421269] * Acc@1 74.343 Acc@5 92.818 loss 1.038
[09:17:10.421421] Accuracy of the network on the 50000 test images: 74.3%
[09:17:10.421442] Max accuracy: 74.34%
[09:17:10.534645] log_dir: ./output_dir_qkformer
[09:17:13.185053] Epoch: [93]  [   0/5004]  eta: 3:40:56  lr: 0.000692  loss: 2.6680 (2.6680)  time: 2.6492  data: 2.0221  max mem: 24440
[09:32:33.399884] Epoch: [93]  [2000/5004]  eta: 0:23:05  lr: 0.000688  loss: 2.7379 (2.8171)  time: 0.4615  data: 0.0003  max mem: 24440
[09:47:51.934055] Epoch: [93]  [4000/5004]  eta: 0:07:42  lr: 0.000684  loss: 2.8657 (2.8120)  time: 0.4590  data: 0.0003  max mem: 24440
[09:55:32.914903] Epoch: [93]  [5003/5004]  eta: 0:00:00  lr: 0.000682  loss: 2.6352 (2.8128)  time: 0.4538  data: 0.0009  max mem: 24440
[09:55:33.336251] Epoch: [93] Total time: 0:38:22 (0.4602 s / it)
[09:55:33.351419] Averaged stats: lr: 0.000682  loss: 2.6352 (2.8201)
[09:55:34.915667] Test:  [   0/1563]  eta: 0:40:36  loss: 0.1871 (0.1871)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.5587  data: 1.2745  max mem: 24440
[09:56:58.898629] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6700 (0.8216)  acc1: 81.2500 (78.8423)  acc5: 96.8750 (95.7086)  time: 0.1678  data: 0.0002  max mem: 24440
[09:58:22.890737] Test:  [1000/1563]  eta: 0:01:35  loss: 1.3036 (0.9570)  acc1: 65.6250 (75.7493)  acc5: 90.6250 (93.7656)  time: 0.1681  data: 0.0002  max mem: 24440
[09:59:46.894722] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5654 (1.0313)  acc1: 84.3750 (74.1318)  acc5: 100.0000 (92.7569)  time: 0.1678  data: 0.0002  max mem: 24440
[09:59:57.221366] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5792 (1.0370)  acc1: 87.5000 (74.0220)  acc5: 96.8750 (92.7120)  time: 0.1635  data: 0.0001  max mem: 24440
[09:59:57.346189] Test: Total time: 0:04:23 (0.1689 s / it)
[09:59:57.687587] * Acc@1 74.029 Acc@5 92.712 loss 1.037
[09:59:57.687738] Accuracy of the network on the 50000 test images: 74.0%
[09:59:57.687759] Max accuracy: 74.34%
[09:59:57.767005] log_dir: ./output_dir_qkformer
[10:00:00.352026] Epoch: [94]  [   0/5004]  eta: 3:35:12  lr: 0.000682  loss: 2.8451 (2.8451)  time: 2.5804  data: 2.0121  max mem: 24440
[10:15:21.635261] Epoch: [94]  [2000/5004]  eta: 0:23:06  lr: 0.000679  loss: 2.6605 (2.7936)  time: 0.4615  data: 0.0002  max mem: 24440
[10:30:40.688157] Epoch: [94]  [4000/5004]  eta: 0:07:42  lr: 0.000675  loss: 2.6958 (2.8020)  time: 0.4605  data: 0.0003  max mem: 24440
[10:38:21.888728] Epoch: [94]  [5003/5004]  eta: 0:00:00  lr: 0.000673  loss: 2.8616 (2.8075)  time: 0.4541  data: 0.0009  max mem: 24440
[10:38:22.337632] Epoch: [94] Total time: 0:38:24 (0.4605 s / it)
[10:38:22.340646] Averaged stats: lr: 0.000673  loss: 2.8616 (2.8145)
[10:38:24.416314] Test:  [   0/1563]  eta: 0:53:54  loss: 0.2451 (0.2451)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0696  data: 1.7386  max mem: 24440
[10:39:48.425188] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8022 (0.8571)  acc1: 75.0000 (78.1999)  acc5: 96.8750 (95.4653)  time: 0.1678  data: 0.0002  max mem: 24440
[10:41:12.437065] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1432 (0.9855)  acc1: 68.7500 (75.5869)  acc5: 93.7500 (93.6158)  time: 0.1678  data: 0.0002  max mem: 24440
[10:42:36.451756] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5837 (1.0578)  acc1: 87.5000 (73.9174)  acc5: 96.8750 (92.5945)  time: 0.1679  data: 0.0002  max mem: 24440
[10:42:46.779583] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5577 (1.0608)  acc1: 90.6250 (73.8560)  acc5: 96.8750 (92.5620)  time: 0.1637  data: 0.0001  max mem: 24440
[10:42:46.903951] Test: Total time: 0:04:24 (0.1693 s / it)
[10:42:47.251199] * Acc@1 73.859 Acc@5 92.564 loss 1.061
[10:42:47.251364] Accuracy of the network on the 50000 test images: 73.9%
[10:42:47.251387] Max accuracy: 74.34%
[10:42:47.311491] log_dir: ./output_dir_qkformer
[10:42:49.924375] Epoch: [95]  [   0/5004]  eta: 3:37:43  lr: 0.000673  loss: 3.2747 (3.2747)  time: 2.6107  data: 2.1155  max mem: 24440
[10:58:11.167321] Epoch: [95]  [2000/5004]  eta: 0:23:06  lr: 0.000669  loss: 2.8499 (2.7917)  time: 0.4642  data: 0.0002  max mem: 24440
[11:13:32.479536] Epoch: [95]  [4000/5004]  eta: 0:07:42  lr: 0.000665  loss: 2.7949 (2.8029)  time: 0.4577  data: 0.0002  max mem: 24440
[11:21:13.746784] Epoch: [95]  [5003/5004]  eta: 0:00:00  lr: 0.000663  loss: 2.8655 (2.8055)  time: 0.4533  data: 0.0009  max mem: 24440
[11:21:14.199827] Epoch: [95] Total time: 0:38:26 (0.4610 s / it)
[11:21:14.208089] Averaged stats: lr: 0.000663  loss: 2.8655 (2.8077)
[11:21:15.853453] Test:  [   0/1563]  eta: 0:42:43  loss: 0.3775 (0.3775)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.6400  data: 1.3933  max mem: 24440
[11:22:39.826802] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.8184 (0.8448)  acc1: 81.2500 (78.1063)  acc5: 96.8750 (95.6150)  time: 0.1680  data: 0.0002  max mem: 24440
[11:24:03.806395] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9545 (0.9646)  acc1: 71.8750 (75.6306)  acc5: 96.8750 (93.9498)  time: 0.1678  data: 0.0002  max mem: 24440
[11:25:27.759794] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5393 (1.0371)  acc1: 87.5000 (74.1131)  acc5: 96.8750 (92.8818)  time: 0.1677  data: 0.0002  max mem: 24440
[11:25:38.089490] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4861 (1.0384)  acc1: 90.6250 (74.0880)  acc5: 96.8750 (92.9000)  time: 0.1634  data: 0.0001  max mem: 24440
[11:25:38.201775] Test: Total time: 0:04:23 (0.1689 s / it)
[11:25:38.846825] * Acc@1 74.084 Acc@5 92.899 loss 1.038
[11:25:38.846982] Accuracy of the network on the 50000 test images: 74.1%
[11:25:38.847003] Max accuracy: 74.34%
[11:25:38.913611] log_dir: ./output_dir_qkformer
[11:25:41.465803] Epoch: [96]  [   0/5004]  eta: 3:32:47  lr: 0.000663  loss: 3.2155 (3.2155)  time: 2.5514  data: 2.0074  max mem: 24440
[11:41:02.420378] Epoch: [96]  [2000/5004]  eta: 0:23:06  lr: 0.000659  loss: 2.8282 (2.7930)  time: 0.4602  data: 0.0003  max mem: 24440
[11:56:22.817286] Epoch: [96]  [4000/5004]  eta: 0:07:42  lr: 0.000655  loss: 2.8658 (2.7965)  time: 0.4592  data: 0.0003  max mem: 24440
[12:04:03.897069] Epoch: [96]  [5003/5004]  eta: 0:00:00  lr: 0.000654  loss: 2.7587 (2.8041)  time: 0.4579  data: 0.0006  max mem: 24440
[12:04:04.380743] Epoch: [96] Total time: 0:38:25 (0.4607 s / it)
[12:04:04.386468] Averaged stats: lr: 0.000654  loss: 2.7587 (2.8007)
[12:04:06.280012] Test:  [   0/1563]  eta: 0:49:14  loss: 0.4902 (0.4902)  acc1: 87.5000 (87.5000)  acc5: 96.8750 (96.8750)  time: 1.8900  data: 1.7019  max mem: 24440
[12:05:30.305864] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6824 (0.8254)  acc1: 81.2500 (78.2186)  acc5: 96.8750 (95.5152)  time: 0.1680  data: 0.0002  max mem: 24440
[12:06:54.305946] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0451 (0.9612)  acc1: 71.8750 (75.3434)  acc5: 90.6250 (93.7375)  time: 0.1680  data: 0.0002  max mem: 24440
[12:08:18.301778] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6124 (1.0206)  acc1: 84.3750 (74.1880)  acc5: 96.8750 (92.9110)  time: 0.1679  data: 0.0002  max mem: 24440
[12:08:28.628722] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4300 (1.0226)  acc1: 87.5000 (74.1240)  acc5: 96.8750 (92.8980)  time: 0.1635  data: 0.0001  max mem: 24440
[12:08:28.741651] Test: Total time: 0:04:24 (0.1691 s / it)
[12:08:29.188020] * Acc@1 74.129 Acc@5 92.900 loss 1.023
[12:08:29.188173] Accuracy of the network on the 50000 test images: 74.1%
[12:08:29.188194] Max accuracy: 74.34%
[12:08:29.261088] log_dir: ./output_dir_qkformer
[12:08:31.961968] Epoch: [97]  [   0/5004]  eta: 3:45:03  lr: 0.000654  loss: 2.8078 (2.8078)  time: 2.6986  data: 2.0223  max mem: 24440
[12:23:52.066414] Epoch: [97]  [2000/5004]  eta: 0:23:05  lr: 0.000650  loss: 2.6900 (2.7912)  time: 0.4624  data: 0.0003  max mem: 24440
[12:39:10.598553] Epoch: [97]  [4000/5004]  eta: 0:07:42  lr: 0.000646  loss: 2.7716 (2.7941)  time: 0.4641  data: 0.0002  max mem: 24440
[12:46:51.436196] Epoch: [97]  [5003/5004]  eta: 0:00:00  lr: 0.000644  loss: 2.8319 (2.7953)  time: 0.4541  data: 0.0009  max mem: 24440
[12:46:51.910944] Epoch: [97] Total time: 0:38:22 (0.4602 s / it)
[12:46:51.913093] Averaged stats: lr: 0.000644  loss: 2.8319 (2.7946)
[12:46:53.720743] Test:  [   0/1563]  eta: 0:46:51  loss: 0.2340 (0.2340)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7988  data: 1.6216  max mem: 24440
[12:48:17.669557] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.8573 (0.8291)  acc1: 84.3750 (78.7675)  acc5: 96.8750 (95.6899)  time: 0.1683  data: 0.0002  max mem: 24440
[12:49:41.659832] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1527 (0.9493)  acc1: 65.6250 (76.1613)  acc5: 90.6250 (93.9123)  time: 0.1678  data: 0.0002  max mem: 24440
[12:51:05.621863] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4969 (1.0240)  acc1: 84.3750 (74.5378)  acc5: 96.8750 (92.8090)  time: 0.1678  data: 0.0002  max mem: 24440
[12:51:15.947312] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3777 (1.0226)  acc1: 90.6250 (74.5340)  acc5: 100.0000 (92.8380)  time: 0.1636  data: 0.0001  max mem: 24440
[12:51:16.074656] Test: Total time: 0:04:24 (0.1690 s / it)
[12:51:16.780625] * Acc@1 74.536 Acc@5 92.840 loss 1.023
[12:51:16.780793] Accuracy of the network on the 50000 test images: 74.5%
[12:51:16.780817] Max accuracy: 74.54%
[12:51:16.850720] log_dir: ./output_dir_qkformer
[12:51:19.816091] Epoch: [98]  [   0/5004]  eta: 4:07:10  lr: 0.000644  loss: 2.1753 (2.1753)  time: 2.9637  data: 2.0108  max mem: 24440
[13:06:39.349102] Epoch: [98]  [2000/5004]  eta: 0:23:04  lr: 0.000640  loss: 2.6735 (2.7804)  time: 0.4574  data: 0.0002  max mem: 24440
[13:21:57.987496] Epoch: [98]  [4000/5004]  eta: 0:07:41  lr: 0.000636  loss: 2.8456 (2.7875)  time: 0.4557  data: 0.0002  max mem: 24440
[13:29:38.541283] Epoch: [98]  [5003/5004]  eta: 0:00:00  lr: 0.000634  loss: 2.7853 (2.7932)  time: 0.4544  data: 0.0009  max mem: 24440
[13:29:38.974468] Epoch: [98] Total time: 0:38:22 (0.4601 s / it)
[13:29:38.975789] Averaged stats: lr: 0.000634  loss: 2.7853 (2.7900)
[13:29:40.873032] Test:  [   0/1563]  eta: 0:49:14  loss: 0.3565 (0.3565)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.8902  data: 1.6485  max mem: 24440
[13:31:04.902881] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8851 (0.8323)  acc1: 75.0000 (78.9421)  acc5: 96.8750 (95.9269)  time: 0.1678  data: 0.0002  max mem: 24440
[13:32:28.914862] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9224 (0.9492)  acc1: 75.0000 (76.6764)  acc5: 93.7500 (94.2120)  time: 0.1678  data: 0.0002  max mem: 24440
[13:33:52.888376] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5852 (1.0389)  acc1: 87.5000 (74.4920)  acc5: 96.8750 (93.1025)  time: 0.1684  data: 0.0002  max mem: 24440
[13:34:03.212258] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5877 (1.0391)  acc1: 87.5000 (74.4680)  acc5: 96.8750 (93.1280)  time: 0.1636  data: 0.0001  max mem: 24440
[13:34:03.336708] Test: Total time: 0:04:24 (0.1691 s / it)
[13:34:03.896334] * Acc@1 74.461 Acc@5 93.128 loss 1.039
[13:34:03.896513] Accuracy of the network on the 50000 test images: 74.5%
[13:34:03.896537] Max accuracy: 74.54%
[13:34:04.013058] log_dir: ./output_dir_qkformer
[13:34:06.669553] Epoch: [99]  [   0/5004]  eta: 3:41:20  lr: 0.000634  loss: 2.3924 (2.3924)  time: 2.6541  data: 2.1388  max mem: 24440
[13:49:26.285957] Epoch: [99]  [2000/5004]  eta: 0:23:04  lr: 0.000630  loss: 2.8480 (2.7691)  time: 0.4562  data: 0.0002  max mem: 24440
[14:04:45.067526] Epoch: [99]  [4000/5004]  eta: 0:07:41  lr: 0.000627  loss: 2.7368 (2.7788)  time: 0.4572  data: 0.0002  max mem: 24440
[14:12:25.554035] Epoch: [99]  [5003/5004]  eta: 0:00:00  lr: 0.000625  loss: 2.8633 (2.7814)  time: 0.4529  data: 0.0009  max mem: 24440
[14:12:26.009383] Epoch: [99] Total time: 0:38:21 (0.4600 s / it)
[14:12:26.013891] Averaged stats: lr: 0.000625  loss: 2.8633 (2.7848)
[14:12:27.459278] Test:  [   0/1563]  eta: 0:37:28  loss: 0.2039 (0.2039)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4387  data: 1.2660  max mem: 24440
[14:13:51.588257] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.9710 (0.7919)  acc1: 75.0000 (79.0232)  acc5: 96.8750 (96.0704)  time: 0.1680  data: 0.0002  max mem: 24440
[14:15:15.587536] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0717 (0.9027)  acc1: 68.7500 (76.8357)  acc5: 93.7500 (94.4805)  time: 0.1678  data: 0.0002  max mem: 24440
[14:16:39.573536] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4062 (0.9940)  acc1: 90.6250 (74.8480)  acc5: 100.0000 (93.2795)  time: 0.1679  data: 0.0002  max mem: 24440
[14:16:49.905539] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5012 (0.9958)  acc1: 90.6250 (74.8080)  acc5: 96.8750 (93.2820)  time: 0.1635  data: 0.0001  max mem: 24440
[14:16:50.033992] Test: Total time: 0:04:24 (0.1689 s / it)
[14:16:50.389030] * Acc@1 74.819 Acc@5 93.281 loss 0.996
[14:16:50.389175] Accuracy of the network on the 50000 test images: 74.8%
[14:16:50.389195] Max accuracy: 74.82%
[14:16:50.462081] log_dir: ./output_dir_qkformer
[14:16:53.044636] Epoch: [100]  [   0/5004]  eta: 3:35:16  lr: 0.000625  loss: 2.6299 (2.6299)  time: 2.5812  data: 1.9720  max mem: 24440
[14:32:13.002798] Epoch: [100]  [2000/5004]  eta: 0:23:04  lr: 0.000621  loss: 2.6815 (2.7588)  time: 0.4610  data: 0.0002  max mem: 24440
[14:47:31.438178] Epoch: [100]  [4000/5004]  eta: 0:07:41  lr: 0.000617  loss: 2.7103 (2.7719)  time: 0.4578  data: 0.0003  max mem: 24440
[14:55:12.045592] Epoch: [100]  [5003/5004]  eta: 0:00:00  lr: 0.000615  loss: 2.8286 (2.7758)  time: 0.4543  data: 0.0006  max mem: 24440
[14:55:12.442109] Epoch: [100] Total time: 0:38:21 (0.4600 s / it)
[14:55:12.445700] Averaged stats: lr: 0.000615  loss: 2.8286 (2.7780)
[14:55:14.640377] Test:  [   0/1563]  eta: 0:57:03  loss: 0.3603 (0.3603)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1905  data: 2.0123  max mem: 24440
[14:56:38.625090] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7433 (0.7866)  acc1: 75.0000 (79.7904)  acc5: 96.8750 (95.9082)  time: 0.1678  data: 0.0002  max mem: 24440
[14:58:02.619702] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0517 (0.9176)  acc1: 75.0000 (76.9668)  acc5: 93.7500 (94.1964)  time: 0.1681  data: 0.0002  max mem: 24440
[14:59:26.611598] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4449 (0.9966)  acc1: 84.3750 (75.1686)  acc5: 96.8750 (93.2087)  time: 0.1680  data: 0.0002  max mem: 24440
[14:59:36.944727] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4585 (0.9967)  acc1: 90.6250 (75.1520)  acc5: 100.0000 (93.2540)  time: 0.1635  data: 0.0001  max mem: 24440
[14:59:37.073276] Test: Total time: 0:04:24 (0.1693 s / it)
[14:59:37.471611] * Acc@1 75.154 Acc@5 93.256 loss 0.997
[14:59:37.471756] Accuracy of the network on the 50000 test images: 75.2%
[14:59:37.471777] Max accuracy: 75.15%
[14:59:37.565750] log_dir: ./output_dir_qkformer
[14:59:40.078153] Epoch: [101]  [   0/5004]  eta: 3:29:09  lr: 0.000615  loss: 2.6308 (2.6308)  time: 2.5078  data: 2.0382  max mem: 24440
[15:15:00.289306] Epoch: [101]  [2000/5004]  eta: 0:23:05  lr: 0.000611  loss: 2.9334 (2.7472)  time: 0.4582  data: 0.0002  max mem: 24440
[15:30:20.317473] Epoch: [101]  [4000/5004]  eta: 0:07:42  lr: 0.000607  loss: 2.7698 (2.7610)  time: 0.4625  data: 0.0003  max mem: 24440
[15:38:01.248923] Epoch: [101]  [5003/5004]  eta: 0:00:00  lr: 0.000605  loss: 2.6572 (2.7673)  time: 0.4540  data: 0.0009  max mem: 24440
[15:38:01.627614] Epoch: [101] Total time: 0:38:24 (0.4604 s / it)
[15:38:01.827980] Averaged stats: lr: 0.000605  loss: 2.6572 (2.7687)
[15:38:03.624388] Test:  [   0/1563]  eta: 0:46:41  loss: 0.4651 (0.4651)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.7926  data: 1.6079  max mem: 24440
[15:39:27.675446] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6375 (0.8170)  acc1: 84.3750 (79.0170)  acc5: 96.8750 (95.8021)  time: 0.1683  data: 0.0002  max mem: 24440
[15:40:51.718254] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2986 (0.9450)  acc1: 62.5000 (76.4486)  acc5: 90.6250 (94.1340)  time: 0.1682  data: 0.0002  max mem: 24440
[15:42:15.728116] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5089 (1.0155)  acc1: 84.3750 (74.9792)  acc5: 96.8750 (93.2149)  time: 0.1679  data: 0.0002  max mem: 24440
[15:42:26.052908] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3848 (1.0166)  acc1: 90.6250 (74.9460)  acc5: 100.0000 (93.2300)  time: 0.1636  data: 0.0001  max mem: 24440
[15:42:26.189579] Test: Total time: 0:04:24 (0.1691 s / it)
[15:42:26.615942] * Acc@1 74.951 Acc@5 93.231 loss 1.017
[15:42:26.616088] Accuracy of the network on the 50000 test images: 75.0%
[15:42:26.616110] Max accuracy: 75.15%
[15:42:26.695177] log_dir: ./output_dir_qkformer
[15:42:29.247432] Epoch: [102]  [   0/5004]  eta: 3:32:42  lr: 0.000605  loss: 2.7775 (2.7775)  time: 2.5504  data: 2.0363  max mem: 24440
[15:57:49.551463] Epoch: [102]  [2000/5004]  eta: 0:23:05  lr: 0.000601  loss: 2.6862 (2.7417)  time: 0.4629  data: 0.0002  max mem: 24440
[16:13:08.817118] Epoch: [102]  [4000/5004]  eta: 0:07:42  lr: 0.000598  loss: 2.6766 (2.7598)  time: 0.4569  data: 0.0003  max mem: 24440
[16:20:49.601050] Epoch: [102]  [5003/5004]  eta: 0:00:00  lr: 0.000596  loss: 2.7013 (2.7608)  time: 0.4537  data: 0.0009  max mem: 24440
[16:20:50.079339] Epoch: [102] Total time: 0:38:23 (0.4603 s / it)
[16:20:50.081353] Averaged stats: lr: 0.000596  loss: 2.7013 (2.7655)
[16:20:52.233746] Test:  [   0/1563]  eta: 0:55:58  loss: 0.5403 (0.5403)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.1486  data: 1.9736  max mem: 24440
[16:22:16.252282] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8539 (0.8058)  acc1: 81.2500 (79.0482)  acc5: 96.8750 (96.0516)  time: 0.1679  data: 0.0002  max mem: 24440
[16:23:40.265197] Test:  [1000/1563]  eta: 0:01:35  loss: 1.2067 (0.9411)  acc1: 65.6250 (76.2737)  acc5: 93.7500 (94.2901)  time: 0.1681  data: 0.0002  max mem: 24440
[16:25:04.259266] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4763 (1.0174)  acc1: 87.5000 (74.6606)  acc5: 96.8750 (93.2295)  time: 0.1679  data: 0.0002  max mem: 24440
[16:25:14.587446] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5027 (1.0171)  acc1: 90.6250 (74.6700)  acc5: 96.8750 (93.2420)  time: 0.1635  data: 0.0001  max mem: 24440
[16:25:14.681125] Test: Total time: 0:04:24 (0.1693 s / it)
[16:25:15.101524] * Acc@1 74.666 Acc@5 93.246 loss 1.017
[16:25:15.101660] Accuracy of the network on the 50000 test images: 74.7%
[16:25:15.101681] Max accuracy: 75.15%
[16:25:15.167794] log_dir: ./output_dir_qkformer
[16:25:17.771563] Epoch: [103]  [   0/5004]  eta: 3:37:00  lr: 0.000596  loss: 2.1076 (2.1076)  time: 2.6021  data: 2.1166  max mem: 24440
[16:40:38.050304] Epoch: [103]  [2000/5004]  eta: 0:23:05  lr: 0.000592  loss: 2.7741 (2.7444)  time: 0.4573  data: 0.0002  max mem: 24440
[16:55:57.248843] Epoch: [103]  [4000/5004]  eta: 0:07:42  lr: 0.000588  loss: 2.7113 (2.7520)  time: 0.4582  data: 0.0003  max mem: 24440
[17:03:37.963389] Epoch: [103]  [5003/5004]  eta: 0:00:00  lr: 0.000586  loss: 2.7308 (2.7541)  time: 0.4536  data: 0.0009  max mem: 24440
[17:03:38.377590] Epoch: [103] Total time: 0:38:23 (0.4603 s / it)
[17:03:38.380174] Averaged stats: lr: 0.000586  loss: 2.7308 (2.7582)
[17:03:40.644571] Test:  [   0/1563]  eta: 0:58:51  loss: 0.2555 (0.2555)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.2596  data: 1.9229  max mem: 24440
[17:05:04.686022] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.9189 (0.8491)  acc1: 81.2500 (79.4099)  acc5: 96.8750 (95.5589)  time: 0.1682  data: 0.0002  max mem: 24440
[17:06:28.739078] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0936 (0.9550)  acc1: 71.8750 (76.6265)  acc5: 96.8750 (94.2027)  time: 0.1682  data: 0.0002  max mem: 24440
[17:07:52.792143] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5731 (1.0262)  acc1: 87.5000 (74.9105)  acc5: 96.8750 (93.2545)  time: 0.1681  data: 0.0002  max mem: 24440
[17:08:03.118137] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5484 (1.0299)  acc1: 87.5000 (74.8520)  acc5: 100.0000 (93.2060)  time: 0.1636  data: 0.0001  max mem: 24440
[17:08:03.237255] Test: Total time: 0:04:24 (0.1695 s / it)
[17:08:03.614706] * Acc@1 74.847 Acc@5 93.209 loss 1.030
[17:08:03.614863] Accuracy of the network on the 50000 test images: 74.8%
[17:08:03.614883] Max accuracy: 75.15%
[17:08:03.689167] log_dir: ./output_dir_qkformer
[17:08:06.305347] Epoch: [104]  [   0/5004]  eta: 3:37:57  lr: 0.000586  loss: 2.6269 (2.6269)  time: 2.6133  data: 1.8652  max mem: 24440
[17:23:28.034366] Epoch: [104]  [2000/5004]  eta: 0:23:07  lr: 0.000582  loss: 2.6036 (2.7407)  time: 0.4636  data: 0.0002  max mem: 24440
[17:38:48.838900] Epoch: [104]  [4000/5004]  eta: 0:07:42  lr: 0.000578  loss: 2.6940 (2.7544)  time: 0.4630  data: 0.0002  max mem: 24440
[17:46:30.535410] Epoch: [104]  [5003/5004]  eta: 0:00:00  lr: 0.000576  loss: 2.6777 (2.7547)  time: 0.4531  data: 0.0010  max mem: 24440
[17:46:30.991839] Epoch: [104] Total time: 0:38:27 (0.4611 s / it)
[17:46:31.016959] Averaged stats: lr: 0.000576  loss: 2.6777 (2.7532)
[17:46:32.948246] Test:  [   0/1563]  eta: 0:50:03  loss: 0.1437 (0.1437)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 1.9218  data: 1.5568  max mem: 24440
[17:47:56.923327] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8361 (0.8071)  acc1: 78.1250 (79.5035)  acc5: 96.8750 (95.9706)  time: 0.1678  data: 0.0002  max mem: 24440
[17:49:20.922004] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1805 (0.9276)  acc1: 59.3750 (76.7639)  acc5: 93.7500 (94.3338)  time: 0.1682  data: 0.0002  max mem: 24440
[17:50:44.912498] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6863 (1.0016)  acc1: 87.5000 (75.1603)  acc5: 96.8750 (93.3294)  time: 0.1678  data: 0.0002  max mem: 24440
[17:50:55.246087] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4448 (1.0020)  acc1: 90.6250 (75.1960)  acc5: 100.0000 (93.3480)  time: 0.1635  data: 0.0001  max mem: 24440
[17:50:55.360515] Test: Total time: 0:04:24 (0.1691 s / it)
[17:50:56.091927] * Acc@1 75.193 Acc@5 93.348 loss 1.002
[17:50:56.092076] Accuracy of the network on the 50000 test images: 75.2%
[17:50:56.092098] Max accuracy: 75.19%
[17:50:56.187117] log_dir: ./output_dir_qkformer
[17:50:59.013895] Epoch: [105]  [   0/5004]  eta: 3:55:24  lr: 0.000576  loss: 2.7036 (2.7036)  time: 2.8226  data: 2.1790  max mem: 24440
[18:06:20.131662] Epoch: [105]  [2000/5004]  eta: 0:23:06  lr: 0.000573  loss: 2.7831 (2.7334)  time: 0.4593  data: 0.0003  max mem: 24440
[18:21:41.464079] Epoch: [105]  [4000/5004]  eta: 0:07:43  lr: 0.000569  loss: 2.8615 (2.7393)  time: 0.4621  data: 0.0002  max mem: 24440
[18:29:23.350519] Epoch: [105]  [5003/5004]  eta: 0:00:00  lr: 0.000567  loss: 2.6109 (2.7444)  time: 0.4544  data: 0.0009  max mem: 24440
[18:29:23.778792] Epoch: [105] Total time: 0:38:27 (0.4611 s / it)
[18:29:23.798900] Averaged stats: lr: 0.000567  loss: 2.6109 (2.7463)
[18:29:26.114034] Test:  [   0/1563]  eta: 1:00:12  loss: 0.3791 (0.3791)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.3113  data: 2.1355  max mem: 24440
[18:30:50.131255] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.6749 (0.7889)  acc1: 78.1250 (79.2665)  acc5: 96.8750 (96.0392)  time: 0.1681  data: 0.0002  max mem: 24440
[18:32:14.119675] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0244 (0.9182)  acc1: 71.8750 (76.6109)  acc5: 93.7500 (94.2901)  time: 0.1678  data: 0.0002  max mem: 24440
[18:33:38.098024] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5135 (1.0080)  acc1: 87.5000 (74.6232)  acc5: 96.8750 (93.0692)  time: 0.1678  data: 0.0002  max mem: 24440
[18:33:48.418990] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5709 (1.0091)  acc1: 87.5000 (74.6160)  acc5: 96.8750 (93.0820)  time: 0.1635  data: 0.0001  max mem: 24440
[18:33:48.541700] Test: Total time: 0:04:24 (0.1694 s / it)
[18:33:49.190660] * Acc@1 74.619 Acc@5 93.079 loss 1.009
[18:33:49.190860] Accuracy of the network on the 50000 test images: 74.6%
[18:33:49.190881] Max accuracy: 75.19%
[18:33:49.236706] log_dir: ./output_dir_qkformer
[18:33:51.808592] Epoch: [106]  [   0/5004]  eta: 3:34:22  lr: 0.000567  loss: 2.9444 (2.9444)  time: 2.5704  data: 2.0220  max mem: 24440
[18:49:11.688977] Epoch: [106]  [2000/5004]  eta: 0:23:04  lr: 0.000563  loss: 2.7869 (2.7448)  time: 0.4581  data: 0.0002  max mem: 24440
[19:04:30.659623] Epoch: [106]  [4000/5004]  eta: 0:07:42  lr: 0.000559  loss: 2.9122 (2.7395)  time: 0.4617  data: 0.0003  max mem: 24440
[19:12:11.671991] Epoch: [106]  [5003/5004]  eta: 0:00:00  lr: 0.000557  loss: 2.7792 (2.7397)  time: 0.4575  data: 0.0006  max mem: 24440
[19:12:12.120422] Epoch: [106] Total time: 0:38:22 (0.4602 s / it)
[19:12:12.121651] Averaged stats: lr: 0.000557  loss: 2.7792 (2.7397)
[19:12:14.366884] Test:  [   0/1563]  eta: 0:58:19  loss: 0.3109 (0.3109)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.2393  data: 1.9388  max mem: 24440
[19:13:38.339503] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7501 (0.8008)  acc1: 78.1250 (79.0482)  acc5: 93.7500 (95.8583)  time: 0.1678  data: 0.0002  max mem: 24440
[19:15:02.349510] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1259 (0.9172)  acc1: 68.7500 (76.7576)  acc5: 96.8750 (94.2183)  time: 0.1682  data: 0.0002  max mem: 24440
[19:16:26.326393] Test:  [1500/1563]  eta: 0:00:10  loss: 0.6241 (0.9993)  acc1: 87.5000 (75.0271)  acc5: 96.8750 (93.1733)  time: 0.1678  data: 0.0002  max mem: 24440
[19:16:36.646932] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4806 (1.0023)  acc1: 87.5000 (74.9400)  acc5: 96.8750 (93.1460)  time: 0.1635  data: 0.0001  max mem: 24440
[19:16:36.769598] Test: Total time: 0:04:24 (0.1693 s / it)
[19:16:37.568629] * Acc@1 74.940 Acc@5 93.144 loss 1.002
[19:16:37.568841] Accuracy of the network on the 50000 test images: 74.9%
[19:16:37.568870] Max accuracy: 75.19%
[19:16:37.633361] log_dir: ./output_dir_qkformer
[19:16:40.192674] Epoch: [107]  [   0/5004]  eta: 3:33:11  lr: 0.000557  loss: 3.0783 (3.0783)  time: 2.5563  data: 1.8504  max mem: 24440
[19:32:02.500808] Epoch: [107]  [2000/5004]  eta: 0:23:08  lr: 0.000553  loss: 2.8025 (2.7310)  time: 0.4649  data: 0.0003  max mem: 24440
[19:47:24.996520] Epoch: [107]  [4000/5004]  eta: 0:07:43  lr: 0.000549  loss: 2.8155 (2.7407)  time: 0.4603  data: 0.0003  max mem: 24440
[19:55:06.989475] Epoch: [107]  [5003/5004]  eta: 0:00:00  lr: 0.000547  loss: 2.8105 (2.7378)  time: 0.4540  data: 0.0010  max mem: 24440
[19:55:07.504721] Epoch: [107] Total time: 0:38:29 (0.4616 s / it)
[19:55:07.630206] Averaged stats: lr: 0.000547  loss: 2.8105 (2.7311)
[19:55:09.928355] Test:  [   0/1563]  eta: 0:59:41  loss: 0.2752 (0.2752)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.2913  data: 2.0700  max mem: 24440
[19:56:33.999089] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.7264 (0.8040)  acc1: 81.2500 (79.3226)  acc5: 96.8750 (95.9331)  time: 0.1683  data: 0.0005  max mem: 24440
[19:57:58.034889] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1786 (0.9209)  acc1: 65.6250 (76.5765)  acc5: 93.7500 (94.2620)  time: 0.1680  data: 0.0002  max mem: 24440
[19:59:22.028890] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5263 (0.9968)  acc1: 84.3750 (74.9833)  acc5: 96.8750 (93.2503)  time: 0.1683  data: 0.0002  max mem: 24440
[19:59:32.364375] Test:  [1562/1563]  eta: 0:00:00  loss: 0.5028 (0.9974)  acc1: 87.5000 (74.9480)  acc5: 96.8750 (93.2720)  time: 0.1636  data: 0.0001  max mem: 24440
[19:59:32.476705] Test: Total time: 0:04:24 (0.1694 s / it)
[19:59:32.939193] * Acc@1 74.957 Acc@5 93.274 loss 0.997
[19:59:32.939340] Accuracy of the network on the 50000 test images: 75.0%
[19:59:32.939362] Max accuracy: 75.19%
[19:59:33.024621] log_dir: ./output_dir_qkformer
[19:59:35.608263] Epoch: [108]  [   0/5004]  eta: 3:35:15  lr: 0.000547  loss: 2.8644 (2.8644)  time: 2.5810  data: 2.1079  max mem: 24440
[20:14:55.623523] Epoch: [108]  [2000/5004]  eta: 0:23:04  lr: 0.000544  loss: 2.8634 (2.7299)  time: 0.4581  data: 0.0003  max mem: 24440
[20:30:15.214896] Epoch: [108]  [4000/5004]  eta: 0:07:42  lr: 0.000540  loss: 2.5651 (2.7284)  time: 0.4583  data: 0.0003  max mem: 24440
[20:37:57.450682] Epoch: [108]  [5003/5004]  eta: 0:00:00  lr: 0.000538  loss: 2.6762 (2.7325)  time: 0.4571  data: 0.0010  max mem: 24440
[20:37:57.909650] Epoch: [108] Total time: 0:38:24 (0.4606 s / it)
[20:37:57.911105] Averaged stats: lr: 0.000538  loss: 2.6762 (2.7264)
[20:37:59.527123] Test:  [   0/1563]  eta: 0:42:00  loss: 0.4607 (0.4607)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.6127  data: 1.4380  max mem: 24440
[20:39:23.522339] Test:  [ 500/1563]  eta: 0:03:01  loss: 1.0766 (0.7983)  acc1: 75.0000 (79.7592)  acc5: 93.7500 (96.2949)  time: 0.1678  data: 0.0002  max mem: 24440
[20:40:47.512183] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1296 (0.9367)  acc1: 65.6250 (76.8981)  acc5: 93.7500 (94.4150)  time: 0.1678  data: 0.0002  max mem: 24440
[20:42:11.506868] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3983 (1.0081)  acc1: 87.5000 (75.1249)  acc5: 96.8750 (93.5022)  time: 0.1680  data: 0.0002  max mem: 24440
[20:42:21.840609] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4898 (1.0066)  acc1: 90.6250 (75.1200)  acc5: 96.8750 (93.5200)  time: 0.1635  data: 0.0001  max mem: 24440
[20:42:21.970861] Test: Total time: 0:04:24 (0.1689 s / it)
[20:42:22.566507] * Acc@1 75.129 Acc@5 93.514 loss 1.007
[20:42:22.566656] Accuracy of the network on the 50000 test images: 75.1%
[20:42:22.566679] Max accuracy: 75.19%
[20:42:22.651278] log_dir: ./output_dir_qkformer
[20:42:25.277765] Epoch: [109]  [   0/5004]  eta: 3:38:56  lr: 0.000538  loss: 2.5066 (2.5066)  time: 2.6252  data: 2.1294  max mem: 24440
[20:57:47.904699] Epoch: [109]  [2000/5004]  eta: 0:23:08  lr: 0.000534  loss: 2.6848 (2.7198)  time: 0.4622  data: 0.0002  max mem: 24440
[21:13:15.694789] Epoch: [109]  [4000/5004]  eta: 0:07:44  lr: 0.000530  loss: 2.6993 (2.7266)  time: 0.4593  data: 0.0002  max mem: 24440
[21:20:57.814715] Epoch: [109]  [5003/5004]  eta: 0:00:00  lr: 0.000528  loss: 2.7237 (2.7262)  time: 0.4538  data: 0.0006  max mem: 24440
[21:20:58.299740] Epoch: [109] Total time: 0:38:35 (0.4628 s / it)
[21:20:58.355878] Averaged stats: lr: 0.000528  loss: 2.7237 (2.7206)
[21:21:00.157924] Test:  [   0/1563]  eta: 0:46:47  loss: 0.3253 (0.3253)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.7960  data: 1.5796  max mem: 24440
[21:22:24.615500] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8326 (0.7900)  acc1: 78.1250 (80.1335)  acc5: 96.8750 (96.1015)  time: 0.1677  data: 0.0002  max mem: 24440
[21:23:48.621102] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9932 (0.9162)  acc1: 71.8750 (77.4507)  acc5: 90.6250 (94.2589)  time: 0.1680  data: 0.0002  max mem: 24440
[21:25:12.591246] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5530 (1.0003)  acc1: 87.5000 (75.4518)  acc5: 96.8750 (93.2545)  time: 0.1678  data: 0.0002  max mem: 24440
[21:25:22.911226] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3810 (1.0021)  acc1: 90.6250 (75.4080)  acc5: 96.8750 (93.2360)  time: 0.1634  data: 0.0001  max mem: 24440
[21:25:23.035442] Test: Total time: 0:04:24 (0.1693 s / it)
[21:25:23.294589] * Acc@1 75.407 Acc@5 93.237 loss 1.002
[21:25:23.294730] Accuracy of the network on the 50000 test images: 75.4%
[21:25:23.294750] Max accuracy: 75.41%
[21:25:23.382572] log_dir: ./output_dir_qkformer
[21:25:26.233663] Epoch: [110]  [   0/5004]  eta: 3:57:39  lr: 0.000528  loss: 2.9205 (2.9205)  time: 2.8496  data: 2.3783  max mem: 24440
[21:40:47.313075] Epoch: [110]  [2000/5004]  eta: 0:23:06  lr: 0.000524  loss: 2.6285 (2.7148)  time: 0.4597  data: 0.0002  max mem: 24440
[21:56:09.013727] Epoch: [110]  [4000/5004]  eta: 0:07:43  lr: 0.000521  loss: 2.6583 (2.7119)  time: 0.4609  data: 0.0002  max mem: 24440
[22:03:51.558110] Epoch: [110]  [5003/5004]  eta: 0:00:00  lr: 0.000519  loss: 2.7353 (2.7174)  time: 0.4575  data: 0.0008  max mem: 24440
[22:03:52.037391] Epoch: [110] Total time: 0:38:28 (0.4614 s / it)
[22:03:52.049475] Averaged stats: lr: 0.000519  loss: 2.7353 (2.7135)
[22:03:54.142088] Test:  [   0/1563]  eta: 0:54:21  loss: 0.2431 (0.2431)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 2.0865  data: 1.8575  max mem: 24440
[22:05:18.118023] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8698 (0.7972)  acc1: 78.1250 (79.8777)  acc5: 96.8750 (96.1452)  time: 0.1679  data: 0.0002  max mem: 24440
[22:06:42.083844] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9707 (0.9107)  acc1: 68.7500 (76.9730)  acc5: 93.7500 (94.5242)  time: 0.1677  data: 0.0002  max mem: 24440
[22:08:06.057827] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5472 (0.9700)  acc1: 87.5000 (75.6870)  acc5: 96.8750 (93.6188)  time: 0.1679  data: 0.0002  max mem: 24440
[22:08:16.376395] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2949 (0.9718)  acc1: 90.6250 (75.6560)  acc5: 96.8750 (93.6020)  time: 0.1635  data: 0.0001  max mem: 24440
[22:08:16.486491] Test: Total time: 0:04:24 (0.1692 s / it)
[22:08:17.106872] * Acc@1 75.654 Acc@5 93.600 loss 0.972
[22:08:17.107026] Accuracy of the network on the 50000 test images: 75.7%
[22:08:17.107048] Max accuracy: 75.65%
[22:08:17.186050] log_dir: ./output_dir_qkformer
[22:08:20.015785] Epoch: [111]  [   0/5004]  eta: 3:55:47  lr: 0.000519  loss: 2.8455 (2.8455)  time: 2.8273  data: 1.8746  max mem: 24440
[22:23:40.117445] Epoch: [111]  [2000/5004]  eta: 0:23:05  lr: 0.000515  loss: 2.7064 (2.6989)  time: 0.4568  data: 0.0002  max mem: 24440
[22:38:59.615548] Epoch: [111]  [4000/5004]  eta: 0:07:42  lr: 0.000511  loss: 2.8756 (2.7074)  time: 0.4590  data: 0.0002  max mem: 24440
[22:46:41.106052] Epoch: [111]  [5003/5004]  eta: 0:00:00  lr: 0.000509  loss: 2.6493 (2.7067)  time: 0.4530  data: 0.0006  max mem: 24440
[22:46:41.562964] Epoch: [111] Total time: 0:38:24 (0.4605 s / it)
[22:46:41.566046] Averaged stats: lr: 0.000509  loss: 2.6493 (2.7083)
[22:46:43.651858] Test:  [   0/1563]  eta: 0:54:11  loss: 0.3411 (0.3411)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 2.0802  data: 1.8813  max mem: 24440
[22:48:07.623831] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.8762 (0.7830)  acc1: 78.1250 (79.9089)  acc5: 96.8750 (96.1452)  time: 0.1683  data: 0.0002  max mem: 24440
[22:49:31.614023] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0253 (0.8980)  acc1: 68.7500 (77.3757)  acc5: 96.8750 (94.4462)  time: 0.1682  data: 0.0002  max mem: 24440
[22:50:55.564401] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5560 (0.9645)  acc1: 87.5000 (75.7433)  acc5: 96.8750 (93.5314)  time: 0.1680  data: 0.0002  max mem: 24440
[22:51:05.881435] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3935 (0.9654)  acc1: 90.6250 (75.7020)  acc5: 96.8750 (93.5140)  time: 0.1635  data: 0.0001  max mem: 24440
[22:51:06.002533] Test: Total time: 0:04:24 (0.1692 s / it)
[22:51:06.537400] * Acc@1 75.700 Acc@5 93.517 loss 0.965
[22:51:06.537554] Accuracy of the network on the 50000 test images: 75.7%
[22:51:06.537578] Max accuracy: 75.70%
[22:51:06.628584] log_dir: ./output_dir_qkformer
[22:51:09.535155] Epoch: [112]  [   0/5004]  eta: 4:02:15  lr: 0.000509  loss: 2.9716 (2.9716)  time: 2.9047  data: 2.3554  max mem: 24440
[23:06:30.832028] Epoch: [112]  [2000/5004]  eta: 0:23:07  lr: 0.000505  loss: 2.6214 (2.6963)  time: 0.4604  data: 0.0003  max mem: 24440
[23:21:51.114497] Epoch: [112]  [4000/5004]  eta: 0:07:42  lr: 0.000501  loss: 2.4880 (2.6922)  time: 0.4566  data: 0.0003  max mem: 24440
[23:29:32.522359] Epoch: [112]  [5003/5004]  eta: 0:00:00  lr: 0.000500  loss: 2.6280 (2.6962)  time: 0.4544  data: 0.0008  max mem: 24440
[23:29:33.004209] Epoch: [112] Total time: 0:38:26 (0.4609 s / it)
[23:29:33.009822] Averaged stats: lr: 0.000500  loss: 2.6280 (2.6977)
[23:29:35.104406] Test:  [   0/1563]  eta: 0:54:28  loss: 0.2928 (0.2928)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.0910  data: 1.9159  max mem: 24440
[23:30:59.077514] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7585 (0.7749)  acc1: 81.2500 (79.7655)  acc5: 96.8750 (96.1951)  time: 0.1677  data: 0.0002  max mem: 24440
[23:32:23.025760] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8685 (0.8923)  acc1: 75.0000 (77.3570)  acc5: 93.7500 (94.5929)  time: 0.1679  data: 0.0002  max mem: 24440
[23:33:46.986919] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5491 (0.9502)  acc1: 87.5000 (76.0410)  acc5: 96.8750 (93.8354)  time: 0.1679  data: 0.0002  max mem: 24440
[23:33:57.307356] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6165 (0.9524)  acc1: 87.5000 (75.9980)  acc5: 96.8750 (93.8460)  time: 0.1635  data: 0.0001  max mem: 24440
[23:33:57.421843] Test: Total time: 0:04:24 (0.1692 s / it)
[23:33:57.892205] * Acc@1 75.997 Acc@5 93.849 loss 0.952
[23:33:57.892362] Accuracy of the network on the 50000 test images: 76.0%
[23:33:57.892385] Max accuracy: 76.00%
[23:33:57.990987] log_dir: ./output_dir_qkformer
[23:34:00.552282] Epoch: [113]  [   0/5004]  eta: 3:33:29  lr: 0.000500  loss: 3.4999 (3.4999)  time: 2.5598  data: 2.0832  max mem: 24440
[23:49:20.797308] Epoch: [113]  [2000/5004]  eta: 0:23:05  lr: 0.000496  loss: 2.5186 (2.6866)  time: 0.4567  data: 0.0002  max mem: 24440
[00:04:41.025834] Epoch: [113]  [4000/5004]  eta: 0:07:42  lr: 0.000492  loss: 2.5628 (2.6838)  time: 0.4619  data: 0.0002  max mem: 24440
[00:12:22.721377] Epoch: [113]  [5003/5004]  eta: 0:00:00  lr: 0.000490  loss: 2.7006 (2.6865)  time: 0.4530  data: 0.0006  max mem: 24440
[00:12:23.094278] Epoch: [113] Total time: 0:38:25 (0.4607 s / it)
[00:12:23.095324] Averaged stats: lr: 0.000490  loss: 2.7006 (2.6914)
[00:12:24.767056] Test:  [   0/1563]  eta: 0:43:26  loss: 0.3133 (0.3133)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6677  data: 1.4927  max mem: 24440
[00:13:48.725820] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7896 (0.7487)  acc1: 78.1250 (80.5202)  acc5: 96.8750 (96.3199)  time: 0.1682  data: 0.0002  max mem: 24440
[00:15:12.715558] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1676 (0.8793)  acc1: 71.8750 (77.5412)  acc5: 93.7500 (94.6366)  time: 0.1680  data: 0.0002  max mem: 24440
[00:16:36.706749] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4365 (0.9573)  acc1: 87.5000 (75.8869)  acc5: 96.8750 (93.6480)  time: 0.1680  data: 0.0002  max mem: 24440
[00:16:47.035948] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3932 (0.9572)  acc1: 90.6250 (75.9040)  acc5: 100.0000 (93.6560)  time: 0.1635  data: 0.0001  max mem: 24440
[00:16:47.159426] Test: Total time: 0:04:24 (0.1689 s / it)
[00:16:47.654387] * Acc@1 75.903 Acc@5 93.660 loss 0.957
[00:16:47.654560] Accuracy of the network on the 50000 test images: 75.9%
[00:16:47.654588] Max accuracy: 76.00%
[00:16:47.761279] log_dir: ./output_dir_qkformer
[00:16:50.643561] Epoch: [114]  [   0/5004]  eta: 4:00:15  lr: 0.000490  loss: 2.6105 (2.6105)  time: 2.8808  data: 2.3889  max mem: 24440
[00:32:10.660237] Epoch: [114]  [2000/5004]  eta: 0:23:05  lr: 0.000486  loss: 2.5088 (2.6784)  time: 0.4683  data: 0.0002  max mem: 24440
[00:47:32.116281] Epoch: [114]  [4000/5004]  eta: 0:07:42  lr: 0.000482  loss: 2.5772 (2.6839)  time: 0.4614  data: 0.0002  max mem: 24440
[00:55:13.690782] Epoch: [114]  [5003/5004]  eta: 0:00:00  lr: 0.000481  loss: 2.6794 (2.6858)  time: 0.4529  data: 0.0005  max mem: 24440
[00:55:14.151302] Epoch: [114] Total time: 0:38:26 (0.4609 s / it)
[00:55:14.155289] Averaged stats: lr: 0.000481  loss: 2.6794 (2.6848)
[00:55:15.903100] Test:  [   0/1563]  eta: 0:45:24  loss: 0.3853 (0.3853)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.7431  data: 1.5449  max mem: 24440
[00:56:39.935586] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6714 (0.7324)  acc1: 81.2500 (80.6512)  acc5: 100.0000 (96.4696)  time: 0.1680  data: 0.0002  max mem: 24440
[00:58:03.926301] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0935 (0.8544)  acc1: 71.8750 (78.0969)  acc5: 90.6250 (94.8739)  time: 0.1678  data: 0.0002  max mem: 24440
[00:59:27.927174] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5840 (0.9276)  acc1: 87.5000 (76.5510)  acc5: 96.8750 (93.9499)  time: 0.1681  data: 0.0002  max mem: 24440
[00:59:38.250695] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4919 (0.9286)  acc1: 87.5000 (76.5080)  acc5: 96.8750 (93.9680)  time: 0.1635  data: 0.0001  max mem: 24440
[00:59:38.368594] Test: Total time: 0:04:24 (0.1690 s / it)
[00:59:38.733200] * Acc@1 76.509 Acc@5 93.969 loss 0.928
[00:59:38.733397] Accuracy of the network on the 50000 test images: 76.5%
[00:59:38.733420] Max accuracy: 76.51%
[00:59:38.842623] log_dir: ./output_dir_qkformer
[00:59:41.465661] Epoch: [115]  [   0/5004]  eta: 3:38:42  lr: 0.000481  loss: 2.7959 (2.7959)  time: 2.6224  data: 2.0717  max mem: 24440
[01:15:05.346229] Epoch: [115]  [2000/5004]  eta: 0:23:10  lr: 0.000477  loss: 2.7180 (2.6742)  time: 0.4589  data: 0.0003  max mem: 24440
[01:30:26.600013] Epoch: [115]  [4000/5004]  eta: 0:07:43  lr: 0.000473  loss: 2.6508 (2.6773)  time: 0.4590  data: 0.0003  max mem: 24440
[01:38:08.673718] Epoch: [115]  [5003/5004]  eta: 0:00:00  lr: 0.000471  loss: 2.7511 (2.6783)  time: 0.4540  data: 0.0009  max mem: 24440
[01:38:09.190789] Epoch: [115] Total time: 0:38:30 (0.4617 s / it)
[01:38:09.197775] Averaged stats: lr: 0.000471  loss: 2.7511 (2.6792)
[01:38:10.978152] Test:  [   0/1563]  eta: 0:46:13  loss: 0.2867 (0.2867)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7747  data: 1.5554  max mem: 24440
[01:39:35.007866] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6838 (0.7535)  acc1: 78.1250 (80.6949)  acc5: 96.8750 (96.3635)  time: 0.1685  data: 0.0002  max mem: 24440
[01:40:58.996093] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1670 (0.8858)  acc1: 65.6250 (77.6848)  acc5: 93.7500 (94.5742)  time: 0.1677  data: 0.0002  max mem: 24440
[01:42:22.964116] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4348 (0.9609)  acc1: 87.5000 (76.0389)  acc5: 96.8750 (93.5959)  time: 0.1678  data: 0.0002  max mem: 24440
[01:42:33.290708] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3723 (0.9588)  acc1: 90.6250 (76.0940)  acc5: 100.0000 (93.6180)  time: 0.1634  data: 0.0001  max mem: 24440
[01:42:33.415064] Test: Total time: 0:04:24 (0.1690 s / it)
[01:42:34.023368] * Acc@1 76.093 Acc@5 93.618 loss 0.959
[01:42:34.023552] Accuracy of the network on the 50000 test images: 76.1%
[01:42:34.023579] Max accuracy: 76.51%
[01:42:34.073752] log_dir: ./output_dir_qkformer
[01:42:36.804328] Epoch: [116]  [   0/5004]  eta: 3:47:36  lr: 0.000471  loss: 2.5038 (2.5038)  time: 2.7291  data: 2.0422  max mem: 24440
[01:57:59.820644] Epoch: [116]  [2000/5004]  eta: 0:23:09  lr: 0.000467  loss: 2.6220 (2.6702)  time: 0.4566  data: 0.0002  max mem: 24440
[02:13:22.711717] Epoch: [116]  [4000/5004]  eta: 0:07:43  lr: 0.000464  loss: 2.4829 (2.6730)  time: 0.4612  data: 0.0003  max mem: 24440
[02:21:05.224058] Epoch: [116]  [5003/5004]  eta: 0:00:00  lr: 0.000462  loss: 2.6020 (2.6754)  time: 0.4530  data: 0.0006  max mem: 24440
[02:21:05.723775] Epoch: [116] Total time: 0:38:31 (0.4620 s / it)
[02:21:05.727010] Averaged stats: lr: 0.000462  loss: 2.6020 (2.6707)
[02:21:07.412641] Test:  [   0/1563]  eta: 0:43:47  loss: 0.3507 (0.3507)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.6809  data: 1.5064  max mem: 24440
[02:22:31.370974] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.9261 (0.7516)  acc1: 78.1250 (80.9069)  acc5: 96.8750 (96.4197)  time: 0.1677  data: 0.0002  max mem: 24440
[02:23:55.317943] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9263 (0.8768)  acc1: 75.0000 (77.9377)  acc5: 93.7500 (94.6741)  time: 0.1679  data: 0.0002  max mem: 24440
[02:25:19.339956] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5169 (0.9464)  acc1: 87.5000 (76.4116)  acc5: 93.7500 (93.7667)  time: 0.1678  data: 0.0002  max mem: 24440
[02:25:29.665258] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4046 (0.9470)  acc1: 87.5000 (76.3620)  acc5: 96.8750 (93.7720)  time: 0.1635  data: 0.0001  max mem: 24440
[02:25:29.790601] Test: Total time: 0:04:24 (0.1689 s / it)
[02:25:30.413277] * Acc@1 76.355 Acc@5 93.773 loss 0.947
[02:25:30.413437] Accuracy of the network on the 50000 test images: 76.4%
[02:25:30.413460] Max accuracy: 76.51%
[02:25:30.481734] log_dir: ./output_dir_qkformer
[02:25:33.578810] Epoch: [117]  [   0/5004]  eta: 4:18:10  lr: 0.000462  loss: 2.4231 (2.4231)  time: 3.0956  data: 2.0613  max mem: 24440
[02:40:53.759612] Epoch: [117]  [2000/5004]  eta: 0:23:05  lr: 0.000458  loss: 2.6933 (2.6598)  time: 0.4597  data: 0.0002  max mem: 24440
[02:56:13.395031] Epoch: [117]  [4000/5004]  eta: 0:07:42  lr: 0.000454  loss: 2.7167 (2.6682)  time: 0.4595  data: 0.0002  max mem: 24440
[03:03:55.261210] Epoch: [117]  [5003/5004]  eta: 0:00:00  lr: 0.000452  loss: 2.6107 (2.6722)  time: 0.4535  data: 0.0009  max mem: 24440
[03:03:55.736742] Epoch: [117] Total time: 0:38:25 (0.4607 s / it)
[03:03:55.738750] Averaged stats: lr: 0.000452  loss: 2.6107 (2.6656)
[03:03:57.818571] Test:  [   0/1563]  eta: 0:54:01  loss: 0.2066 (0.2066)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0737  data: 1.8941  max mem: 24440
[03:05:21.921229] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6719 (0.7175)  acc1: 84.3750 (81.4184)  acc5: 96.8750 (96.5007)  time: 0.1679  data: 0.0002  max mem: 24440
[03:06:45.980525] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8649 (0.8459)  acc1: 78.1250 (78.3529)  acc5: 93.7500 (94.8427)  time: 0.1678  data: 0.0002  max mem: 24440
[03:08:10.000996] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4497 (0.9143)  acc1: 90.6250 (76.8842)  acc5: 96.8750 (94.0769)  time: 0.1679  data: 0.0002  max mem: 24440
[03:08:20.329561] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4334 (0.9158)  acc1: 90.6250 (76.8520)  acc5: 100.0000 (94.0860)  time: 0.1636  data: 0.0001  max mem: 24440
[03:08:20.456649] Test: Total time: 0:04:24 (0.1694 s / it)
[03:08:20.758547] * Acc@1 76.848 Acc@5 94.085 loss 0.916
[03:08:20.758710] Accuracy of the network on the 50000 test images: 76.8%
[03:08:20.758733] Max accuracy: 76.85%
[03:08:20.837244] log_dir: ./output_dir_qkformer
[03:08:23.609883] Epoch: [118]  [   0/5004]  eta: 3:51:03  lr: 0.000452  loss: 2.6253 (2.6253)  time: 2.7704  data: 2.2861  max mem: 24440
[03:23:46.595895] Epoch: [118]  [2000/5004]  eta: 0:23:09  lr: 0.000449  loss: 2.6451 (2.6415)  time: 0.4568  data: 0.0003  max mem: 24440
[03:39:08.735345] Epoch: [118]  [4000/5004]  eta: 0:07:43  lr: 0.000445  loss: 2.5877 (2.6534)  time: 0.4630  data: 0.0002  max mem: 24440
[03:46:50.445543] Epoch: [118]  [5003/5004]  eta: 0:00:00  lr: 0.000443  loss: 2.5171 (2.6588)  time: 0.4528  data: 0.0005  max mem: 24440
[03:46:50.914145] Epoch: [118] Total time: 0:38:30 (0.4616 s / it)
[03:46:50.925757] Averaged stats: lr: 0.000443  loss: 2.5171 (2.6585)
[03:46:52.548392] Test:  [   0/1563]  eta: 0:42:07  loss: 0.3418 (0.3418)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.6170  data: 1.4392  max mem: 24440
[03:48:16.515685] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6162 (0.7387)  acc1: 78.1250 (80.9132)  acc5: 96.8750 (96.3510)  time: 0.1677  data: 0.0002  max mem: 24440
[03:49:40.467890] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1150 (0.8652)  acc1: 71.8750 (77.9252)  acc5: 93.7500 (94.8333)  time: 0.1678  data: 0.0002  max mem: 24440
[03:51:04.421851] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5826 (0.9362)  acc1: 87.5000 (76.3429)  acc5: 96.8750 (93.9624)  time: 0.1679  data: 0.0002  max mem: 24440
[03:51:14.752989] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4229 (0.9373)  acc1: 90.6250 (76.3220)  acc5: 100.0000 (93.9560)  time: 0.1635  data: 0.0001  max mem: 24440
[03:51:14.868623] Test: Total time: 0:04:23 (0.1689 s / it)
[03:51:15.496178] * Acc@1 76.322 Acc@5 93.958 loss 0.937
[03:51:15.496336] Accuracy of the network on the 50000 test images: 76.3%
[03:51:15.496358] Max accuracy: 76.85%
[03:51:15.526228] log_dir: ./output_dir_qkformer
[03:51:18.206551] Epoch: [119]  [   0/5004]  eta: 3:43:28  lr: 0.000443  loss: 2.9604 (2.9604)  time: 2.6796  data: 1.9725  max mem: 24440
[04:06:38.381732] Epoch: [119]  [2000/5004]  eta: 0:23:05  lr: 0.000439  loss: 2.4449 (2.6417)  time: 0.4646  data: 0.0003  max mem: 24440
[04:21:58.394108] Epoch: [119]  [4000/5004]  eta: 0:07:42  lr: 0.000436  loss: 2.6927 (2.6473)  time: 0.4589  data: 0.0003  max mem: 24440
[04:29:40.062217] Epoch: [119]  [5003/5004]  eta: 0:00:00  lr: 0.000434  loss: 2.7505 (2.6493)  time: 0.4536  data: 0.0010  max mem: 24440
[04:29:40.482165] Epoch: [119] Total time: 0:38:24 (0.4606 s / it)
[04:29:40.490211] Averaged stats: lr: 0.000434  loss: 2.7505 (2.6496)
[04:29:42.822049] Test:  [   0/1563]  eta: 1:00:36  loss: 0.3229 (0.3229)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.3265  data: 2.0318  max mem: 24440
[04:31:06.871875] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.7009 (0.7151)  acc1: 78.1250 (81.3124)  acc5: 96.8750 (96.6193)  time: 0.1683  data: 0.0002  max mem: 24440
[04:32:30.914009] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9772 (0.8481)  acc1: 65.6250 (78.2062)  acc5: 93.7500 (94.8270)  time: 0.1678  data: 0.0002  max mem: 24440
[04:33:54.914671] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5390 (0.9208)  acc1: 87.5000 (76.5844)  acc5: 96.8750 (93.9353)  time: 0.1680  data: 0.0002  max mem: 24440
[04:34:05.241708] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4510 (0.9221)  acc1: 90.6250 (76.5920)  acc5: 100.0000 (93.9380)  time: 0.1635  data: 0.0001  max mem: 24440
[04:34:05.368337] Test: Total time: 0:04:24 (0.1695 s / it)
[04:34:05.702848] * Acc@1 76.596 Acc@5 93.940 loss 0.922
[04:34:05.702990] Accuracy of the network on the 50000 test images: 76.6%
[04:34:05.703013] Max accuracy: 76.85%
[04:34:05.754270] log_dir: ./output_dir_qkformer
[04:34:08.511152] Epoch: [120]  [   0/5004]  eta: 3:49:47  lr: 0.000434  loss: 2.7582 (2.7582)  time: 2.7553  data: 2.2998  max mem: 24440
[04:49:28.805180] Epoch: [120]  [2000/5004]  eta: 0:23:05  lr: 0.000430  loss: 2.6485 (2.6369)  time: 0.4584  data: 0.0003  max mem: 24440
[05:04:48.275747] Epoch: [120]  [4000/5004]  eta: 0:07:42  lr: 0.000426  loss: 2.5586 (2.6319)  time: 0.4610  data: 0.0002  max mem: 24440
[05:12:29.622602] Epoch: [120]  [5003/5004]  eta: 0:00:00  lr: 0.000424  loss: 2.6525 (2.6351)  time: 0.4528  data: 0.0009  max mem: 24440
[05:12:30.044643] Epoch: [120] Total time: 0:38:24 (0.4605 s / it)
[05:12:30.046401] Averaged stats: lr: 0.000424  loss: 2.6525 (2.6398)
[05:12:31.431093] Test:  [   0/1563]  eta: 0:35:54  loss: 0.1701 (0.1701)  acc1: 96.8750 (96.8750)  acc5: 100.0000 (100.0000)  time: 1.3787  data: 1.1915  max mem: 24440
[05:13:55.439707] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7009 (0.7301)  acc1: 81.2500 (81.4247)  acc5: 96.8750 (96.6317)  time: 0.1678  data: 0.0002  max mem: 24440
[05:15:19.419201] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0661 (0.8551)  acc1: 71.8750 (78.6120)  acc5: 96.8750 (94.9457)  time: 0.1682  data: 0.0002  max mem: 24440
[05:16:43.393622] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4686 (0.9319)  acc1: 90.6250 (76.6093)  acc5: 96.8750 (94.0102)  time: 0.1678  data: 0.0002  max mem: 24440
[05:16:53.714885] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3781 (0.9329)  acc1: 90.6250 (76.5660)  acc5: 96.8750 (94.0240)  time: 0.1636  data: 0.0001  max mem: 24440
[05:16:53.836162] Test: Total time: 0:04:23 (0.1688 s / it)
[05:16:54.434373] * Acc@1 76.563 Acc@5 94.026 loss 0.933
[05:16:54.434539] Accuracy of the network on the 50000 test images: 76.6%
[05:16:54.434561] Max accuracy: 76.85%
[05:16:54.521597] log_dir: ./output_dir_qkformer
[05:16:57.070116] Epoch: [121]  [   0/5004]  eta: 3:32:26  lr: 0.000424  loss: 2.3620 (2.3620)  time: 2.5472  data: 2.0918  max mem: 24440
[05:32:17.625584] Epoch: [121]  [2000/5004]  eta: 0:23:05  lr: 0.000421  loss: 2.5083 (2.6304)  time: 0.4734  data: 0.0002  max mem: 24440
[05:47:37.074304] Epoch: [121]  [4000/5004]  eta: 0:07:42  lr: 0.000417  loss: 2.5724 (2.6350)  time: 0.4603  data: 0.0002  max mem: 24440
[05:55:17.751726] Epoch: [121]  [5003/5004]  eta: 0:00:00  lr: 0.000415  loss: 2.5660 (2.6341)  time: 0.4535  data: 0.0009  max mem: 24440
[05:55:18.184283] Epoch: [121] Total time: 0:38:23 (0.4604 s / it)
[05:55:18.185414] Averaged stats: lr: 0.000415  loss: 2.5660 (2.6367)
[05:55:20.420160] Test:  [   0/1563]  eta: 0:58:04  loss: 0.3920 (0.3920)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.2296  data: 2.0490  max mem: 24440
[05:56:44.427516] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5837 (0.7276)  acc1: 84.3750 (81.1252)  acc5: 96.8750 (96.4259)  time: 0.1678  data: 0.0002  max mem: 24440
[05:58:08.488359] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8832 (0.8617)  acc1: 71.8750 (78.1437)  acc5: 90.6250 (94.7521)  time: 0.1680  data: 0.0002  max mem: 24440
[05:59:32.504455] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5315 (0.9247)  acc1: 87.5000 (76.7572)  acc5: 100.0000 (94.0914)  time: 0.1680  data: 0.0002  max mem: 24440
[05:59:42.826946] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4911 (0.9277)  acc1: 90.6250 (76.6800)  acc5: 100.0000 (94.0760)  time: 0.1635  data: 0.0001  max mem: 24440
[05:59:42.933791] Test: Total time: 0:04:24 (0.1694 s / it)
[05:59:43.363402] * Acc@1 76.679 Acc@5 94.075 loss 0.928
[05:59:43.363558] Accuracy of the network on the 50000 test images: 76.7%
[05:59:43.363578] Max accuracy: 76.85%
[05:59:43.403644] log_dir: ./output_dir_qkformer
[05:59:46.027463] Epoch: [122]  [   0/5004]  eta: 3:38:30  lr: 0.000415  loss: 2.3651 (2.3651)  time: 2.6199  data: 1.9281  max mem: 24440
[06:15:06.789365] Epoch: [122]  [2000/5004]  eta: 0:23:06  lr: 0.000412  loss: 2.7053 (2.6153)  time: 0.4595  data: 0.0003  max mem: 24440
[06:30:26.457249] Epoch: [122]  [4000/5004]  eta: 0:07:42  lr: 0.000408  loss: 2.6422 (2.6200)  time: 0.4596  data: 0.0002  max mem: 24440
[06:38:07.909892] Epoch: [122]  [5003/5004]  eta: 0:00:00  lr: 0.000406  loss: 2.5078 (2.6239)  time: 0.4547  data: 0.0009  max mem: 24440
[06:38:08.302595] Epoch: [122] Total time: 0:38:24 (0.4606 s / it)
[06:38:08.361031] Averaged stats: lr: 0.000406  loss: 2.5078 (2.6273)
[06:38:10.273809] Test:  [   0/1563]  eta: 0:49:39  loss: 0.2664 (0.2664)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.9064  data: 1.7314  max mem: 24440
[06:39:34.234180] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6147 (0.7362)  acc1: 87.5000 (81.1939)  acc5: 96.8750 (96.4508)  time: 0.1679  data: 0.0002  max mem: 24440
[06:40:58.214820] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9927 (0.8652)  acc1: 68.7500 (78.1406)  acc5: 93.7500 (94.9145)  time: 0.1680  data: 0.0002  max mem: 24440
[06:42:22.192177] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4483 (0.9380)  acc1: 90.6250 (76.6156)  acc5: 96.8750 (93.9707)  time: 0.1679  data: 0.0002  max mem: 24440
[06:42:32.517747] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3751 (0.9381)  acc1: 90.6250 (76.6120)  acc5: 100.0000 (93.9700)  time: 0.1635  data: 0.0001  max mem: 24440
[06:42:32.646633] Test: Total time: 0:04:24 (0.1691 s / it)
[06:42:33.073951] * Acc@1 76.619 Acc@5 93.975 loss 0.938
[06:42:33.074100] Accuracy of the network on the 50000 test images: 76.6%
[06:42:33.074121] Max accuracy: 76.85%
[06:42:33.180135] log_dir: ./output_dir_qkformer
[06:42:35.893934] Epoch: [123]  [   0/5004]  eta: 3:46:06  lr: 0.000406  loss: 2.4835 (2.4835)  time: 2.7112  data: 2.2510  max mem: 24440
[06:57:55.783199] Epoch: [123]  [2000/5004]  eta: 0:23:04  lr: 0.000402  loss: 2.5105 (2.6126)  time: 0.4602  data: 0.0003  max mem: 24440
[07:13:15.081273] Epoch: [123]  [4000/5004]  eta: 0:07:42  lr: 0.000399  loss: 2.6488 (2.6216)  time: 0.4609  data: 0.0002  max mem: 24440
[07:20:55.871838] Epoch: [123]  [5003/5004]  eta: 0:00:00  lr: 0.000397  loss: 2.5390 (2.6260)  time: 0.4531  data: 0.0009  max mem: 24440
[07:20:56.316883] Epoch: [123] Total time: 0:38:23 (0.4603 s / it)
[07:20:56.330379] Averaged stats: lr: 0.000397  loss: 2.5390 (2.6216)
[07:20:57.766083] Test:  [   0/1563]  eta: 0:37:15  loss: 0.2829 (0.2829)  acc1: 93.7500 (93.7500)  acc5: 100.0000 (100.0000)  time: 1.4303  data: 1.2457  max mem: 24440
[07:22:21.754030] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7466 (0.7197)  acc1: 81.2500 (81.0878)  acc5: 96.8750 (96.7502)  time: 0.1680  data: 0.0002  max mem: 24440
[07:23:45.733002] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9403 (0.8362)  acc1: 75.0000 (78.6869)  acc5: 96.8750 (95.2204)  time: 0.1682  data: 0.0005  max mem: 24440
[07:25:09.678874] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4235 (0.9165)  acc1: 87.5000 (76.9258)  acc5: 96.8750 (94.2080)  time: 0.1678  data: 0.0002  max mem: 24440
[07:25:20.002618] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3762 (0.9194)  acc1: 90.6250 (76.8760)  acc5: 100.0000 (94.1740)  time: 0.1634  data: 0.0001  max mem: 24440
[07:25:20.103883] Test: Total time: 0:04:23 (0.1688 s / it)
[07:25:21.054051] * Acc@1 76.879 Acc@5 94.182 loss 0.919
[07:25:21.054201] Accuracy of the network on the 50000 test images: 76.9%
[07:25:21.054223] Max accuracy: 76.88%
[07:25:21.167530] log_dir: ./output_dir_qkformer
[07:25:23.713428] Epoch: [124]  [   0/5004]  eta: 3:32:09  lr: 0.000397  loss: 2.4341 (2.4341)  time: 2.5439  data: 1.9495  max mem: 24440
[07:40:44.510620] Epoch: [124]  [2000/5004]  eta: 0:23:06  lr: 0.000393  loss: 2.8376 (2.6065)  time: 0.4682  data: 0.0002  max mem: 24440
[07:56:06.099243] Epoch: [124]  [4000/5004]  eta: 0:07:42  lr: 0.000390  loss: 2.6040 (2.6134)  time: 0.4569  data: 0.0002  max mem: 24440
[08:03:47.750100] Epoch: [124]  [5003/5004]  eta: 0:00:00  lr: 0.000388  loss: 2.3854 (2.6140)  time: 0.4529  data: 0.0009  max mem: 24440
[08:03:48.204893] Epoch: [124] Total time: 0:38:27 (0.4610 s / it)
[08:03:48.216019] Averaged stats: lr: 0.000388  loss: 2.3854 (2.6144)
[08:03:49.700611] Test:  [   0/1563]  eta: 0:38:30  loss: 0.2871 (0.2871)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4784  data: 1.2759  max mem: 24440
[08:05:13.718957] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6422 (0.7059)  acc1: 78.1250 (81.0629)  acc5: 96.8750 (96.6130)  time: 0.1679  data: 0.0002  max mem: 24440
[08:06:37.714360] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7717 (0.8277)  acc1: 78.1250 (78.7463)  acc5: 96.8750 (95.1267)  time: 0.1681  data: 0.0002  max mem: 24440
[08:08:01.717867] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4332 (0.9049)  acc1: 90.6250 (77.2069)  acc5: 96.8750 (94.2809)  time: 0.1680  data: 0.0002  max mem: 24440
[08:08:12.047431] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3236 (0.9064)  acc1: 93.7500 (77.1940)  acc5: 100.0000 (94.2580)  time: 0.1636  data: 0.0001  max mem: 24440
[08:08:12.170461] Test: Total time: 0:04:23 (0.1689 s / it)
[08:08:12.705894] * Acc@1 77.198 Acc@5 94.256 loss 0.906
[08:08:12.706071] Accuracy of the network on the 50000 test images: 77.2%
[08:08:12.706102] Max accuracy: 77.20%
[08:08:12.775820] log_dir: ./output_dir_qkformer
[08:08:15.336468] Epoch: [125]  [   0/5004]  eta: 3:33:25  lr: 0.000388  loss: 2.8793 (2.8793)  time: 2.5590  data: 1.8724  max mem: 24440
[08:23:35.626910] Epoch: [125]  [2000/5004]  eta: 0:23:05  lr: 0.000384  loss: 2.6450 (2.6030)  time: 0.4587  data: 0.0002  max mem: 24440
[08:38:55.052661] Epoch: [125]  [4000/5004]  eta: 0:07:42  lr: 0.000381  loss: 2.6573 (2.6120)  time: 0.4580  data: 0.0002  max mem: 24440
[08:46:35.915607] Epoch: [125]  [5003/5004]  eta: 0:00:00  lr: 0.000379  loss: 2.5456 (2.6116)  time: 0.4530  data: 0.0009  max mem: 24440
[08:46:36.544005] Epoch: [125] Total time: 0:38:23 (0.4604 s / it)
[08:46:36.551470] Averaged stats: lr: 0.000379  loss: 2.5456 (2.6051)
[08:46:39.005392] Test:  [   0/1563]  eta: 1:03:44  loss: 0.3480 (0.3480)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.4467  data: 1.9850  max mem: 24440
[08:48:03.201315] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.7243 (0.7279)  acc1: 81.2500 (80.8321)  acc5: 96.8750 (96.7565)  time: 0.1680  data: 0.0002  max mem: 24440
[08:49:27.137842] Test:  [1000/1563]  eta: 0:01:35  loss: 1.1249 (0.8532)  acc1: 68.7500 (78.3872)  acc5: 93.7500 (94.8895)  time: 0.1678  data: 0.0002  max mem: 24440
[08:50:51.086100] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5653 (0.9251)  acc1: 87.5000 (76.8134)  acc5: 96.8750 (93.9103)  time: 0.1684  data: 0.0005  max mem: 24440
[08:51:01.404764] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4440 (0.9277)  acc1: 90.6250 (76.7180)  acc5: 100.0000 (93.9100)  time: 0.1635  data: 0.0001  max mem: 24440
[08:51:01.506829] Test: Total time: 0:04:24 (0.1695 s / it)
[08:51:02.098912] * Acc@1 76.715 Acc@5 93.910 loss 0.928
[08:51:02.099073] Accuracy of the network on the 50000 test images: 76.7%
[08:51:02.099094] Max accuracy: 77.20%
[08:51:02.177462] log_dir: ./output_dir_qkformer
[08:51:04.810370] Epoch: [126]  [   0/5004]  eta: 3:39:22  lr: 0.000379  loss: 2.4618 (2.4618)  time: 2.6303  data: 1.9331  max mem: 24440
[09:06:25.600069] Epoch: [126]  [2000/5004]  eta: 0:23:06  lr: 0.000375  loss: 2.5895 (2.5910)  time: 0.4613  data: 0.0002  max mem: 24440
[09:21:45.903090] Epoch: [126]  [4000/5004]  eta: 0:07:42  lr: 0.000372  loss: 2.6169 (2.5926)  time: 0.4642  data: 0.0002  max mem: 24440
[09:29:26.995827] Epoch: [126]  [5003/5004]  eta: 0:00:00  lr: 0.000370  loss: 2.3614 (2.5905)  time: 0.4531  data: 0.0006  max mem: 24440
[09:29:27.426878] Epoch: [126] Total time: 0:38:25 (0.4607 s / it)
[09:29:27.430576] Averaged stats: lr: 0.000370  loss: 2.3614 (2.5972)
[09:29:28.994480] Test:  [   0/1563]  eta: 0:40:38  loss: 0.2792 (0.2792)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5599  data: 1.3854  max mem: 24440
[09:30:52.978323] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6138 (0.7086)  acc1: 84.3750 (81.8987)  acc5: 96.8750 (96.4446)  time: 0.1678  data: 0.0002  max mem: 24440
[09:32:16.959386] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9160 (0.8283)  acc1: 71.8750 (78.9367)  acc5: 93.7500 (95.0456)  time: 0.1680  data: 0.0002  max mem: 24440
[09:33:40.963129] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4485 (0.9017)  acc1: 87.5000 (77.1715)  acc5: 96.8750 (94.1955)  time: 0.1680  data: 0.0002  max mem: 24440
[09:33:51.302963] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3005 (0.9020)  acc1: 93.7500 (77.1820)  acc5: 100.0000 (94.2200)  time: 0.1638  data: 0.0001  max mem: 24440
[09:33:51.429489] Test: Total time: 0:04:23 (0.1689 s / it)
[09:33:51.926671] * Acc@1 77.180 Acc@5 94.224 loss 0.902
[09:33:51.926826] Accuracy of the network on the 50000 test images: 77.2%
[09:33:51.926849] Max accuracy: 77.20%
[09:33:52.050017] log_dir: ./output_dir_qkformer
[09:33:54.706912] Epoch: [127]  [   0/5004]  eta: 3:41:01  lr: 0.000370  loss: 2.8270 (2.8270)  time: 2.6503  data: 2.1323  max mem: 24440
[09:49:16.248572] Epoch: [127]  [2000/5004]  eta: 0:23:07  lr: 0.000366  loss: 2.7045 (2.5884)  time: 0.4623  data: 0.0003  max mem: 24440
[10:04:38.340759] Epoch: [127]  [4000/5004]  eta: 0:07:43  lr: 0.000363  loss: 2.5824 (2.5953)  time: 0.4575  data: 0.0002  max mem: 24440
[10:12:19.617587] Epoch: [127]  [5003/5004]  eta: 0:00:00  lr: 0.000361  loss: 2.6176 (2.5956)  time: 0.4553  data: 0.0010  max mem: 24440
[10:12:20.056793] Epoch: [127] Total time: 0:38:28 (0.4612 s / it)
[10:12:20.070922] Averaged stats: lr: 0.000361  loss: 2.6176 (2.5920)
[10:12:21.890157] Test:  [   0/1563]  eta: 0:47:17  loss: 0.2582 (0.2582)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.8153  data: 1.6285  max mem: 24440
[10:13:45.903177] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7730 (0.7225)  acc1: 81.2500 (81.2937)  acc5: 96.8750 (96.5070)  time: 0.1678  data: 0.0002  max mem: 24440
[10:15:09.865797] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0109 (0.8354)  acc1: 71.8750 (78.9835)  acc5: 93.7500 (95.0830)  time: 0.1678  data: 0.0002  max mem: 24440
[10:16:33.842516] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5171 (0.9033)  acc1: 90.6250 (77.3526)  acc5: 96.8750 (94.2955)  time: 0.1679  data: 0.0002  max mem: 24440
[10:16:44.172897] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3705 (0.9060)  acc1: 87.5000 (77.2880)  acc5: 100.0000 (94.2680)  time: 0.1635  data: 0.0001  max mem: 24440
[10:16:44.291053] Test: Total time: 0:04:24 (0.1690 s / it)
[10:16:44.978071] * Acc@1 77.290 Acc@5 94.266 loss 0.906
[10:16:44.978228] Accuracy of the network on the 50000 test images: 77.3%
[10:16:44.978249] Max accuracy: 77.29%
[10:16:45.037383] log_dir: ./output_dir_qkformer
[10:16:47.655389] Epoch: [128]  [   0/5004]  eta: 3:38:17  lr: 0.000361  loss: 2.2110 (2.2110)  time: 2.6173  data: 2.0252  max mem: 24440
[10:32:08.142375] Epoch: [128]  [2000/5004]  eta: 0:23:05  lr: 0.000358  loss: 2.5165 (2.5762)  time: 0.4582  data: 0.0003  max mem: 24440
[10:47:28.524755] Epoch: [128]  [4000/5004]  eta: 0:07:42  lr: 0.000354  loss: 2.4914 (2.5792)  time: 0.4613  data: 0.0003  max mem: 24440
[10:55:09.919656] Epoch: [128]  [5003/5004]  eta: 0:00:00  lr: 0.000352  loss: 2.6464 (2.5888)  time: 0.4535  data: 0.0014  max mem: 24440
[10:55:10.461797] Epoch: [128] Total time: 0:38:25 (0.4607 s / it)
[10:55:10.463522] Averaged stats: lr: 0.000352  loss: 2.6464 (2.5842)
[10:55:12.257003] Test:  [   0/1563]  eta: 0:46:34  loss: 0.2249 (0.2249)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.7881  data: 1.3391  max mem: 24440
[10:56:36.357943] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6671 (0.7059)  acc1: 81.2500 (81.6929)  acc5: 96.8750 (96.7814)  time: 0.1687  data: 0.0006  max mem: 24440
[10:58:00.595261] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8701 (0.8111)  acc1: 75.0000 (79.3176)  acc5: 93.7500 (95.4514)  time: 0.1679  data: 0.0002  max mem: 24440
[10:59:24.653793] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5923 (0.8899)  acc1: 87.5000 (77.4359)  acc5: 96.8750 (94.4891)  time: 0.1679  data: 0.0002  max mem: 24440
[10:59:34.985628] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3079 (0.8908)  acc1: 90.6250 (77.4280)  acc5: 100.0000 (94.4880)  time: 0.1636  data: 0.0001  max mem: 24440
[10:59:35.105544] Test: Total time: 0:04:24 (0.1693 s / it)
[10:59:35.411744] * Acc@1 77.431 Acc@5 94.490 loss 0.891
[10:59:35.411923] Accuracy of the network on the 50000 test images: 77.4%
[10:59:35.411947] Max accuracy: 77.43%
[10:59:35.443478] log_dir: ./output_dir_qkformer
[10:59:38.293379] Epoch: [129]  [   0/5004]  eta: 3:57:31  lr: 0.000352  loss: 2.5941 (2.5941)  time: 2.8480  data: 2.0233  max mem: 24440
[11:14:57.641009] Epoch: [129]  [2000/5004]  eta: 0:23:04  lr: 0.000349  loss: 2.4901 (2.5687)  time: 0.4624  data: 0.0002  max mem: 24440
[11:30:16.566253] Epoch: [129]  [4000/5004]  eta: 0:07:41  lr: 0.000345  loss: 2.5496 (2.5708)  time: 0.4578  data: 0.0003  max mem: 24440
[11:37:57.413294] Epoch: [129]  [5003/5004]  eta: 0:00:00  lr: 0.000344  loss: 2.5121 (2.5751)  time: 0.4532  data: 0.0009  max mem: 24440
[11:37:57.803400] Epoch: [129] Total time: 0:38:22 (0.4601 s / it)
[11:37:57.806706] Averaged stats: lr: 0.000344  loss: 2.5121 (2.5742)
[11:37:59.552547] Test:  [   0/1563]  eta: 0:45:22  loss: 0.4367 (0.4367)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7419  data: 1.4235  max mem: 24440
[11:39:23.541989] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6818 (0.7069)  acc1: 81.2500 (82.0983)  acc5: 96.8750 (96.7253)  time: 0.1677  data: 0.0002  max mem: 24440
[11:40:47.528653] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9908 (0.8279)  acc1: 65.6250 (79.5330)  acc5: 93.7500 (95.2797)  time: 0.1678  data: 0.0002  max mem: 24440
[11:42:11.475228] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5728 (0.8942)  acc1: 84.3750 (78.0563)  acc5: 96.8750 (94.4683)  time: 0.1680  data: 0.0002  max mem: 24440
[11:42:21.801167] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3761 (0.8981)  acc1: 93.7500 (77.9580)  acc5: 100.0000 (94.4420)  time: 0.1634  data: 0.0001  max mem: 24440
[11:42:21.984455] Test: Total time: 0:04:24 (0.1690 s / it)
[11:42:22.258545] * Acc@1 77.964 Acc@5 94.446 loss 0.898
[11:42:22.258695] Accuracy of the network on the 50000 test images: 78.0%
[11:42:22.258716] Max accuracy: 77.96%
[11:42:22.320784] log_dir: ./output_dir_qkformer
[11:42:25.527427] Epoch: [130]  [   0/5004]  eta: 4:27:22  lr: 0.000343  loss: 2.7805 (2.7805)  time: 3.2059  data: 2.1639  max mem: 24440
[11:57:47.195299] Epoch: [130]  [2000/5004]  eta: 0:23:08  lr: 0.000340  loss: 2.5048 (2.5578)  time: 0.4588  data: 0.0004  max mem: 24440
[12:13:08.202639] Epoch: [130]  [4000/5004]  eta: 0:07:43  lr: 0.000337  loss: 2.5487 (2.5611)  time: 0.4620  data: 0.0002  max mem: 24440
[12:20:49.555483] Epoch: [130]  [5003/5004]  eta: 0:00:00  lr: 0.000335  loss: 2.5723 (2.5653)  time: 0.4531  data: 0.0009  max mem: 24440
[12:20:50.050955] Epoch: [130] Total time: 0:38:27 (0.4612 s / it)
[12:20:50.053947] Averaged stats: lr: 0.000335  loss: 2.5723 (2.5693)
[12:20:51.997671] Test:  [   0/1563]  eta: 0:50:29  loss: 0.3230 (0.3230)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 1.9386  data: 1.5575  max mem: 24440
[12:22:16.043437] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5841 (0.6979)  acc1: 84.3750 (82.0235)  acc5: 96.8750 (96.7752)  time: 0.1683  data: 0.0002  max mem: 24440
[12:23:40.071341] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8804 (0.8110)  acc1: 78.1250 (79.5579)  acc5: 93.7500 (95.3359)  time: 0.1679  data: 0.0002  max mem: 24440
[12:25:04.070567] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4371 (0.8838)  acc1: 87.5000 (77.6441)  acc5: 100.0000 (94.4662)  time: 0.1678  data: 0.0002  max mem: 24440
[12:25:14.400243] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3964 (0.8844)  acc1: 90.6250 (77.6260)  acc5: 100.0000 (94.4740)  time: 0.1636  data: 0.0001  max mem: 24440
[12:25:14.534526] Test: Total time: 0:04:24 (0.1692 s / it)
[12:25:15.047073] * Acc@1 77.630 Acc@5 94.475 loss 0.884
[12:25:15.047231] Accuracy of the network on the 50000 test images: 77.6%
[12:25:15.047251] Max accuracy: 77.96%
[12:25:15.128852] log_dir: ./output_dir_qkformer
[12:25:17.760541] Epoch: [131]  [   0/5004]  eta: 3:39:16  lr: 0.000335  loss: 3.4999 (3.4999)  time: 2.6292  data: 2.0991  max mem: 24440
[12:40:36.978754] Epoch: [131]  [2000/5004]  eta: 0:23:03  lr: 0.000331  loss: 2.4027 (2.5412)  time: 0.4609  data: 0.0003  max mem: 24440
[12:55:55.636317] Epoch: [131]  [4000/5004]  eta: 0:07:41  lr: 0.000328  loss: 2.3596 (2.5550)  time: 0.4612  data: 0.0002  max mem: 24440
[13:03:36.223511] Epoch: [131]  [5003/5004]  eta: 0:00:00  lr: 0.000326  loss: 2.5423 (2.5610)  time: 0.4537  data: 0.0009  max mem: 24440
[13:03:36.615923] Epoch: [131] Total time: 0:38:21 (0.4599 s / it)
[13:03:36.629837] Averaged stats: lr: 0.000326  loss: 2.5423 (2.5616)
[13:03:38.129340] Test:  [   0/1563]  eta: 0:38:55  loss: 0.2865 (0.2865)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4941  data: 1.3068  max mem: 24440
[13:05:02.214720] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7181 (0.7013)  acc1: 81.2500 (81.5245)  acc5: 96.8750 (96.9187)  time: 0.1685  data: 0.0002  max mem: 24440
[13:06:26.238244] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0693 (0.8223)  acc1: 71.8750 (78.8930)  acc5: 93.7500 (95.3359)  time: 0.1681  data: 0.0002  max mem: 24440
[13:07:50.522837] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5222 (0.8870)  acc1: 84.3750 (77.6337)  acc5: 96.8750 (94.5141)  time: 0.1679  data: 0.0002  max mem: 24440
[13:08:00.850886] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4460 (0.8906)  acc1: 93.7500 (77.5820)  acc5: 96.8750 (94.4760)  time: 0.1636  data: 0.0001  max mem: 24440
[13:08:00.985552] Test: Total time: 0:04:24 (0.1691 s / it)
[13:08:01.432747] * Acc@1 77.582 Acc@5 94.475 loss 0.891
[13:08:01.432903] Accuracy of the network on the 50000 test images: 77.6%
[13:08:01.432922] Max accuracy: 77.96%
[13:08:01.534471] log_dir: ./output_dir_qkformer
[13:08:04.110651] Epoch: [132]  [   0/5004]  eta: 3:34:42  lr: 0.000326  loss: 2.6502 (2.6502)  time: 2.5745  data: 1.9581  max mem: 24440
[13:23:24.631837] Epoch: [132]  [2000/5004]  eta: 0:23:05  lr: 0.000323  loss: 2.5421 (2.5349)  time: 0.4638  data: 0.0003  max mem: 24440
[13:38:44.307346] Epoch: [132]  [4000/5004]  eta: 0:07:42  lr: 0.000319  loss: 2.5923 (2.5502)  time: 0.4592  data: 0.0002  max mem: 24440
[13:46:25.553191] Epoch: [132]  [5003/5004]  eta: 0:00:00  lr: 0.000318  loss: 2.5722 (2.5543)  time: 0.4556  data: 0.0005  max mem: 24440
[13:46:26.050187] Epoch: [132] Total time: 0:38:24 (0.4605 s / it)
[13:46:26.053915] Averaged stats: lr: 0.000318  loss: 2.5722 (2.5535)
[13:46:28.561510] Test:  [   0/1563]  eta: 1:05:09  loss: 0.3257 (0.3257)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.5013  data: 2.2013  max mem: 24440
[13:47:52.637325] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.7900 (0.6806)  acc1: 78.1250 (82.1544)  acc5: 96.8750 (96.8064)  time: 0.1683  data: 0.0002  max mem: 24440
[13:49:16.673289] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9590 (0.8142)  acc1: 78.1250 (79.3769)  acc5: 93.7500 (95.2641)  time: 0.1680  data: 0.0002  max mem: 24440
[13:50:40.684412] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4075 (0.8839)  acc1: 90.6250 (77.7461)  acc5: 96.8750 (94.4870)  time: 0.1681  data: 0.0002  max mem: 24440
[13:50:51.014583] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4553 (0.8855)  acc1: 90.6250 (77.6800)  acc5: 100.0000 (94.4740)  time: 0.1636  data: 0.0001  max mem: 24440
[13:50:51.110527] Test: Total time: 0:04:25 (0.1696 s / it)
[13:50:51.596576] * Acc@1 77.683 Acc@5 94.475 loss 0.885
[13:50:51.596733] Accuracy of the network on the 50000 test images: 77.7%
[13:50:51.596757] Max accuracy: 77.96%
[13:50:51.681734] log_dir: ./output_dir_qkformer
[13:50:54.271092] Epoch: [133]  [   0/5004]  eta: 3:35:49  lr: 0.000318  loss: 1.9686 (1.9686)  time: 2.5879  data: 1.9303  max mem: 24440
[14:06:15.442822] Epoch: [133]  [2000/5004]  eta: 0:23:06  lr: 0.000314  loss: 2.3512 (2.5368)  time: 0.4586  data: 0.0003  max mem: 24440
[14:21:34.720550] Epoch: [133]  [4000/5004]  eta: 0:07:42  lr: 0.000311  loss: 2.5672 (2.5418)  time: 0.4587  data: 0.0002  max mem: 24440
[14:29:15.559348] Epoch: [133]  [5003/5004]  eta: 0:00:00  lr: 0.000309  loss: 2.4487 (2.5385)  time: 0.4572  data: 0.0009  max mem: 24440
[14:29:16.005398] Epoch: [133] Total time: 0:38:24 (0.4605 s / it)
[14:29:16.018849] Averaged stats: lr: 0.000309  loss: 2.4487 (2.5428)
[14:29:18.072782] Test:  [   0/1563]  eta: 0:53:23  loss: 0.3062 (0.3062)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0494  data: 1.7892  max mem: 24440
[14:30:42.069502] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6994 (0.7169)  acc1: 81.2500 (82.0546)  acc5: 96.8750 (96.6317)  time: 0.1678  data: 0.0002  max mem: 24440
[14:32:06.058601] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8521 (0.8210)  acc1: 68.7500 (79.7172)  acc5: 93.7500 (95.4015)  time: 0.1679  data: 0.0002  max mem: 24440
[14:33:30.049844] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5120 (0.8908)  acc1: 87.5000 (78.0792)  acc5: 96.8750 (94.6515)  time: 0.1678  data: 0.0002  max mem: 24440
[14:33:40.380958] Test:  [1562/1563]  eta: 0:00:00  loss: 0.6389 (0.8937)  acc1: 90.6250 (77.9960)  acc5: 96.8750 (94.6280)  time: 0.1635  data: 0.0001  max mem: 24440
[14:33:40.516312] Test: Total time: 0:04:24 (0.1692 s / it)
[14:33:40.969365] * Acc@1 78.000 Acc@5 94.626 loss 0.894
[14:33:40.969516] Accuracy of the network on the 50000 test images: 78.0%
[14:33:40.969537] Max accuracy: 78.00%
[14:33:41.039568] log_dir: ./output_dir_qkformer
[14:33:43.653288] Epoch: [134]  [   0/5004]  eta: 3:37:42  lr: 0.000309  loss: 3.1053 (3.1053)  time: 2.6103  data: 1.9605  max mem: 24440
[14:49:03.066228] Epoch: [134]  [2000/5004]  eta: 0:23:04  lr: 0.000306  loss: 2.5634 (2.5235)  time: 0.4694  data: 0.0002  max mem: 24440
[15:04:20.725454] Epoch: [134]  [4000/5004]  eta: 0:07:41  lr: 0.000302  loss: 2.7392 (2.5292)  time: 0.4566  data: 0.0003  max mem: 24440
[15:12:01.234861] Epoch: [134]  [5003/5004]  eta: 0:00:00  lr: 0.000301  loss: 2.4676 (2.5316)  time: 0.4526  data: 0.0006  max mem: 24440
[15:12:01.610948] Epoch: [134] Total time: 0:38:20 (0.4597 s / it)
[15:12:01.612318] Averaged stats: lr: 0.000301  loss: 2.4676 (2.5369)
[15:12:03.133129] Test:  [   0/1563]  eta: 0:39:26  loss: 0.3280 (0.3280)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5141  data: 1.3387  max mem: 24440
[15:13:27.146690] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6137 (0.6683)  acc1: 84.3750 (82.3977)  acc5: 96.8750 (96.9499)  time: 0.1678  data: 0.0002  max mem: 24440
[15:14:51.155830] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8684 (0.7922)  acc1: 71.8750 (79.8639)  acc5: 93.7500 (95.4577)  time: 0.1679  data: 0.0002  max mem: 24440
[15:16:15.127042] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5681 (0.8686)  acc1: 87.5000 (78.0730)  acc5: 96.8750 (94.6015)  time: 0.1680  data: 0.0002  max mem: 24440
[15:16:25.463023] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3899 (0.8691)  acc1: 93.7500 (78.0660)  acc5: 100.0000 (94.6160)  time: 0.1635  data: 0.0001  max mem: 24440
[15:16:25.586412] Test: Total time: 0:04:23 (0.1689 s / it)
[15:16:25.857834] * Acc@1 78.066 Acc@5 94.618 loss 0.869
[15:16:25.857981] Accuracy of the network on the 50000 test images: 78.1%
[15:16:25.858003] Max accuracy: 78.07%
[15:16:25.972054] log_dir: ./output_dir_qkformer
[15:16:28.416109] Epoch: [135]  [   0/5004]  eta: 3:23:43  lr: 0.000301  loss: 1.7208 (1.7208)  time: 2.4428  data: 1.9284  max mem: 24440
[15:31:48.607314] Epoch: [135]  [2000/5004]  eta: 0:23:05  lr: 0.000297  loss: 2.5824 (2.5258)  time: 0.4580  data: 0.0002  max mem: 24440
[15:47:08.694902] Epoch: [135]  [4000/5004]  eta: 0:07:42  lr: 0.000294  loss: 2.4954 (2.5358)  time: 0.4584  data: 0.0002  max mem: 24440
[15:54:49.722237] Epoch: [135]  [5003/5004]  eta: 0:00:00  lr: 0.000292  loss: 2.4870 (2.5415)  time: 0.4579  data: 0.0009  max mem: 24440
[15:54:50.122184] Epoch: [135] Total time: 0:38:24 (0.4605 s / it)
[15:54:50.123157] Averaged stats: lr: 0.000292  loss: 2.4870 (2.5317)
[15:54:51.624896] Test:  [   0/1563]  eta: 0:38:59  loss: 0.3080 (0.3080)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4970  data: 1.3051  max mem: 24440
[15:56:15.722466] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7629 (0.6976)  acc1: 78.1250 (82.4538)  acc5: 96.8750 (96.6941)  time: 0.1679  data: 0.0002  max mem: 24440
[15:57:39.776627] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9101 (0.8060)  acc1: 78.1250 (79.7827)  acc5: 93.7500 (95.3515)  time: 0.1681  data: 0.0002  max mem: 24440
[15:59:03.843677] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4981 (0.8713)  acc1: 84.3750 (78.2166)  acc5: 96.8750 (94.6598)  time: 0.1681  data: 0.0002  max mem: 24440
[15:59:14.171865] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3725 (0.8721)  acc1: 90.6250 (78.1840)  acc5: 96.8750 (94.6820)  time: 0.1637  data: 0.0001  max mem: 24440
[15:59:14.306760] Test: Total time: 0:04:24 (0.1690 s / it)
[15:59:14.777212] * Acc@1 78.187 Acc@5 94.683 loss 0.872
[15:59:14.777410] Accuracy of the network on the 50000 test images: 78.2%
[15:59:14.777435] Max accuracy: 78.19%
[15:59:14.898434] log_dir: ./output_dir_qkformer
[15:59:17.750276] Epoch: [136]  [   0/5004]  eta: 3:57:09  lr: 0.000292  loss: 2.5526 (2.5526)  time: 2.8436  data: 2.2762  max mem: 24440
[16:14:38.238098] Epoch: [136]  [2000/5004]  eta: 0:23:06  lr: 0.000289  loss: 2.5676 (2.5165)  time: 0.4600  data: 0.0003  max mem: 24440
[16:29:58.909403] Epoch: [136]  [4000/5004]  eta: 0:07:42  lr: 0.000286  loss: 2.5125 (2.5239)  time: 0.4578  data: 0.0003  max mem: 24440
[16:37:40.243199] Epoch: [136]  [5003/5004]  eta: 0:00:00  lr: 0.000284  loss: 2.4028 (2.5245)  time: 0.4539  data: 0.0007  max mem: 24440
[16:37:40.872265] Epoch: [136] Total time: 0:38:25 (0.4608 s / it)
[16:37:40.876591] Averaged stats: lr: 0.000284  loss: 2.4028 (2.5206)
[16:37:43.115249] Test:  [   0/1563]  eta: 0:58:12  loss: 0.3367 (0.3367)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.2347  data: 1.8600  max mem: 24440
[16:39:07.110818] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.7058 (0.7045)  acc1: 81.2500 (82.0921)  acc5: 96.8750 (96.8313)  time: 0.1680  data: 0.0002  max mem: 24440
[16:40:31.108997] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8651 (0.8157)  acc1: 78.1250 (79.4424)  acc5: 93.7500 (95.3484)  time: 0.1678  data: 0.0002  max mem: 24440
[16:41:55.117142] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4315 (0.8909)  acc1: 87.5000 (77.7044)  acc5: 96.8750 (94.5016)  time: 0.1679  data: 0.0002  max mem: 24440
[16:42:05.442692] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3132 (0.8906)  acc1: 93.7500 (77.7400)  acc5: 96.8750 (94.5120)  time: 0.1635  data: 0.0001  max mem: 24440
[16:42:05.567911] Test: Total time: 0:04:24 (0.1693 s / it)
[16:42:05.909170] * Acc@1 77.741 Acc@5 94.507 loss 0.891
[16:42:05.909320] Accuracy of the network on the 50000 test images: 77.7%
[16:42:05.909341] Max accuracy: 78.19%
[16:42:06.021909] log_dir: ./output_dir_qkformer
[16:42:08.594426] Epoch: [137]  [   0/5004]  eta: 3:34:20  lr: 0.000284  loss: 2.5297 (2.5297)  time: 2.5700  data: 2.0791  max mem: 24440
[16:57:29.741644] Epoch: [137]  [2000/5004]  eta: 0:23:06  lr: 0.000281  loss: 2.3981 (2.5065)  time: 0.4608  data: 0.0002  max mem: 24440
[17:13:02.050755] Epoch: [137]  [4000/5004]  eta: 0:07:45  lr: 0.000278  loss: 2.3933 (2.5133)  time: 0.4572  data: 0.0002  max mem: 24440
[17:20:43.859236] Epoch: [137]  [5003/5004]  eta: 0:00:00  lr: 0.000276  loss: 2.5170 (2.5155)  time: 0.4538  data: 0.0008  max mem: 24440
[17:20:44.487951] Epoch: [137] Total time: 0:38:38 (0.4633 s / it)
[17:20:44.492657] Averaged stats: lr: 0.000276  loss: 2.5170 (2.5148)
[17:20:47.090529] Test:  [   0/1563]  eta: 1:07:30  loss: 0.3045 (0.3045)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.5915  data: 2.4030  max mem: 24440
[17:22:11.334044] Test:  [ 500/1563]  eta: 0:03:04  loss: 0.6382 (0.6687)  acc1: 81.2500 (83.0090)  acc5: 96.8750 (97.0247)  time: 0.1680  data: 0.0002  max mem: 24440
[17:23:35.342227] Test:  [1000/1563]  eta: 0:01:36  loss: 1.1785 (0.8008)  acc1: 68.7500 (80.0574)  acc5: 90.6250 (95.4421)  time: 0.1678  data: 0.0002  max mem: 24440
[17:24:59.315571] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3999 (0.8675)  acc1: 90.6250 (78.4394)  acc5: 100.0000 (94.7452)  time: 0.1679  data: 0.0002  max mem: 24440
[17:25:09.660457] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4611 (0.8712)  acc1: 87.5000 (78.3340)  acc5: 96.8750 (94.7160)  time: 0.1637  data: 0.0002  max mem: 24440
[17:25:09.788047] Test: Total time: 0:04:25 (0.1697 s / it)
[17:25:10.352726] * Acc@1 78.337 Acc@5 94.715 loss 0.871
[17:25:10.352873] Accuracy of the network on the 50000 test images: 78.3%
[17:25:10.352895] Max accuracy: 78.34%
[17:25:10.464751] log_dir: ./output_dir_qkformer
[17:25:13.362173] Epoch: [138]  [   0/5004]  eta: 4:01:31  lr: 0.000276  loss: 2.8389 (2.8389)  time: 2.8960  data: 2.3389  max mem: 24440
[17:40:34.256058] Epoch: [138]  [2000/5004]  eta: 0:23:06  lr: 0.000273  loss: 2.5558 (2.4927)  time: 0.4595  data: 0.0003  max mem: 24440
[17:55:53.604461] Epoch: [138]  [4000/5004]  eta: 0:07:42  lr: 0.000270  loss: 2.3771 (2.5057)  time: 0.4592  data: 0.0003  max mem: 24440
[18:03:34.671573] Epoch: [138]  [5003/5004]  eta: 0:00:00  lr: 0.000268  loss: 2.3622 (2.5051)  time: 0.4526  data: 0.0006  max mem: 24440
[18:03:35.132528] Epoch: [138] Total time: 0:38:24 (0.4606 s / it)
[18:03:35.134785] Averaged stats: lr: 0.000268  loss: 2.3622 (2.5031)
[18:03:36.850119] Test:  [   0/1563]  eta: 0:44:31  loss: 0.4337 (0.4337)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7090  data: 1.5351  max mem: 24440
[18:05:00.892596] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6271 (0.6574)  acc1: 81.2500 (82.8406)  acc5: 100.0000 (97.0122)  time: 0.1682  data: 0.0002  max mem: 24440
[18:06:24.938873] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9337 (0.7883)  acc1: 78.1250 (80.0980)  acc5: 93.7500 (95.4858)  time: 0.1679  data: 0.0002  max mem: 24440
[18:07:48.916291] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3645 (0.8624)  acc1: 90.6250 (78.2978)  acc5: 96.8750 (94.6723)  time: 0.1678  data: 0.0002  max mem: 24440
[18:07:59.243683] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3605 (0.8634)  acc1: 93.7500 (78.2540)  acc5: 100.0000 (94.6740)  time: 0.1635  data: 0.0001  max mem: 24440
[18:07:59.379370] Test: Total time: 0:04:24 (0.1691 s / it)
[18:07:59.886294] * Acc@1 78.251 Acc@5 94.678 loss 0.863
[18:07:59.886445] Accuracy of the network on the 50000 test images: 78.3%
[18:07:59.886468] Max accuracy: 78.34%
[18:07:59.971286] log_dir: ./output_dir_qkformer
[18:08:02.531719] Epoch: [139]  [   0/5004]  eta: 3:33:19  lr: 0.000268  loss: 1.9586 (1.9586)  time: 2.5578  data: 1.8679  max mem: 24440
[18:23:24.054297] Epoch: [139]  [2000/5004]  eta: 0:23:07  lr: 0.000265  loss: 2.4650 (2.4831)  time: 0.4624  data: 0.0002  max mem: 24440
[18:38:45.396379] Epoch: [139]  [4000/5004]  eta: 0:07:43  lr: 0.000262  loss: 2.5357 (2.4909)  time: 0.4587  data: 0.0002  max mem: 24440
[18:46:27.146205] Epoch: [139]  [5003/5004]  eta: 0:00:00  lr: 0.000260  loss: 2.5048 (2.4920)  time: 0.4536  data: 0.0005  max mem: 24440
[18:46:27.661671] Epoch: [139] Total time: 0:38:27 (0.4612 s / it)
[18:46:27.674578] Averaged stats: lr: 0.000260  loss: 2.5048 (2.4941)
[18:46:29.547621] Test:  [   0/1563]  eta: 0:48:37  loss: 0.2405 (0.2405)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8669  data: 1.6600  max mem: 24440
[18:47:53.555368] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5701 (0.6546)  acc1: 87.5000 (83.0901)  acc5: 96.8750 (97.0996)  time: 0.1681  data: 0.0005  max mem: 24440
[18:49:17.554809] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9981 (0.7764)  acc1: 75.0000 (80.3009)  acc5: 96.8750 (95.6793)  time: 0.1679  data: 0.0002  max mem: 24440
[18:50:41.545294] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4289 (0.8522)  acc1: 87.5000 (78.5768)  acc5: 96.8750 (94.7951)  time: 0.1681  data: 0.0002  max mem: 24440
[18:50:51.888747] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4188 (0.8524)  acc1: 90.6250 (78.5620)  acc5: 100.0000 (94.8220)  time: 0.1636  data: 0.0001  max mem: 24440
[18:50:51.995397] Test: Total time: 0:04:24 (0.1691 s / it)
[18:50:52.445056] * Acc@1 78.559 Acc@5 94.819 loss 0.852
[18:50:52.445214] Accuracy of the network on the 50000 test images: 78.6%
[18:50:52.445235] Max accuracy: 78.56%
[18:50:52.506456] log_dir: ./output_dir_qkformer
[18:50:55.090169] Epoch: [140]  [   0/5004]  eta: 3:34:58  lr: 0.000260  loss: 2.2001 (2.2001)  time: 2.5777  data: 2.1107  max mem: 24440
[19:06:17.499564] Epoch: [140]  [2000/5004]  eta: 0:23:08  lr: 0.000257  loss: 2.4146 (2.4825)  time: 0.4573  data: 0.0003  max mem: 24440
[19:21:39.214429] Epoch: [140]  [4000/5004]  eta: 0:07:43  lr: 0.000254  loss: 2.5419 (2.4837)  time: 0.4610  data: 0.0002  max mem: 24440
[19:29:21.355233] Epoch: [140]  [5003/5004]  eta: 0:00:00  lr: 0.000252  loss: 2.5482 (2.4888)  time: 0.4576  data: 0.0009  max mem: 24440
[19:29:21.864585] Epoch: [140] Total time: 0:38:29 (0.4615 s / it)
[19:29:21.866495] Averaged stats: lr: 0.000252  loss: 2.5482 (2.4880)
[19:29:24.064226] Test:  [   0/1563]  eta: 0:57:07  loss: 0.2925 (0.2925)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1932  data: 1.8188  max mem: 24440
[19:30:48.059750] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6272 (0.6714)  acc1: 78.1250 (82.6971)  acc5: 100.0000 (96.9873)  time: 0.1678  data: 0.0002  max mem: 24440
[19:32:12.032615] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8795 (0.7768)  acc1: 75.0000 (80.3041)  acc5: 93.7500 (95.6512)  time: 0.1679  data: 0.0002  max mem: 24440
[19:33:36.000080] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3931 (0.8435)  acc1: 87.5000 (78.7766)  acc5: 100.0000 (94.9534)  time: 0.1678  data: 0.0002  max mem: 24440
[19:33:46.334258] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4270 (0.8448)  acc1: 90.6250 (78.7300)  acc5: 100.0000 (94.9400)  time: 0.1635  data: 0.0001  max mem: 24440
[19:33:46.450460] Test: Total time: 0:04:24 (0.1693 s / it)
[19:33:47.169802] * Acc@1 78.722 Acc@5 94.942 loss 0.845
[19:33:47.169956] Accuracy of the network on the 50000 test images: 78.7%
[19:33:47.169979] Max accuracy: 78.72%
[19:33:47.245276] log_dir: ./output_dir_qkformer
[19:33:50.000650] Epoch: [141]  [   0/5004]  eta: 3:49:40  lr: 0.000252  loss: 2.5118 (2.5118)  time: 2.7539  data: 2.1066  max mem: 24440
[19:49:12.381390] Epoch: [141]  [2000/5004]  eta: 0:23:08  lr: 0.000249  loss: 2.5714 (2.4648)  time: 0.4625  data: 0.0002  max mem: 24440
[20:04:33.816129] Epoch: [141]  [4000/5004]  eta: 0:07:43  lr: 0.000246  loss: 2.3996 (2.4729)  time: 0.4582  data: 0.0002  max mem: 24440
[20:12:15.583297] Epoch: [141]  [5003/5004]  eta: 0:00:00  lr: 0.000244  loss: 2.4941 (2.4776)  time: 0.4538  data: 0.0009  max mem: 24440
[20:12:16.072023] Epoch: [141] Total time: 0:38:28 (0.4614 s / it)
[20:12:16.085127] Averaged stats: lr: 0.000244  loss: 2.4941 (2.4771)
[20:12:17.897449] Test:  [   0/1563]  eta: 0:47:01  loss: 0.2890 (0.2890)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8051  data: 1.5521  max mem: 24440
[20:13:41.940128] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6320 (0.6657)  acc1: 84.3750 (83.0277)  acc5: 96.8750 (97.2617)  time: 0.1680  data: 0.0002  max mem: 24440
[20:15:06.005582] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0641 (0.7847)  acc1: 71.8750 (80.1261)  acc5: 93.7500 (95.8136)  time: 0.1679  data: 0.0002  max mem: 24440
[20:16:30.053196] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4879 (0.8490)  acc1: 87.5000 (78.7225)  acc5: 96.8750 (94.9534)  time: 0.1684  data: 0.0002  max mem: 24440
[20:16:40.391725] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4642 (0.8503)  acc1: 90.6250 (78.6560)  acc5: 100.0000 (94.9620)  time: 0.1636  data: 0.0001  max mem: 24440
[20:16:40.528272] Test: Total time: 0:04:24 (0.1692 s / it)
[20:16:40.773203] * Acc@1 78.654 Acc@5 94.971 loss 0.850
[20:16:40.773461] Accuracy of the network on the 50000 test images: 78.7%
[20:16:40.773485] Max accuracy: 78.72%
[20:16:40.851696] log_dir: ./output_dir_qkformer
[20:16:43.960718] Epoch: [142]  [   0/5004]  eta: 4:19:14  lr: 0.000244  loss: 2.1885 (2.1885)  time: 3.1083  data: 2.2651  max mem: 24440
[20:32:05.115231] Epoch: [142]  [2000/5004]  eta: 0:23:07  lr: 0.000241  loss: 2.5557 (2.4604)  time: 0.4591  data: 0.0002  max mem: 24440
[20:47:27.520913] Epoch: [142]  [4000/5004]  eta: 0:07:43  lr: 0.000238  loss: 2.4029 (2.4701)  time: 0.4623  data: 0.0002  max mem: 24440
[20:55:09.544832] Epoch: [142]  [5003/5004]  eta: 0:00:00  lr: 0.000237  loss: 2.5400 (2.4705)  time: 0.4537  data: 0.0005  max mem: 24440
[20:55:10.040346] Epoch: [142] Total time: 0:38:29 (0.4615 s / it)
[20:55:10.113570] Averaged stats: lr: 0.000237  loss: 2.5400 (2.4692)
[20:55:12.260585] Test:  [   0/1563]  eta: 0:55:45  loss: 0.2399 (0.2399)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1407  data: 1.8419  max mem: 24440
[20:56:36.272992] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5903 (0.6602)  acc1: 84.3750 (82.7969)  acc5: 96.8750 (97.0684)  time: 0.1689  data: 0.0002  max mem: 24440
[20:58:00.251880] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8790 (0.7682)  acc1: 71.8750 (80.4383)  acc5: 96.8750 (95.7886)  time: 0.1678  data: 0.0002  max mem: 24440
[20:59:24.265464] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4348 (0.8413)  acc1: 87.5000 (78.7600)  acc5: 100.0000 (95.0575)  time: 0.1678  data: 0.0002  max mem: 24440
[20:59:34.592755] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3752 (0.8436)  acc1: 90.6250 (78.6920)  acc5: 96.8750 (95.0520)  time: 0.1634  data: 0.0001  max mem: 24440
[20:59:34.716866] Test: Total time: 0:04:24 (0.1693 s / it)
[20:59:35.095874] * Acc@1 78.691 Acc@5 95.048 loss 0.844
[20:59:35.096031] Accuracy of the network on the 50000 test images: 78.7%
[20:59:35.096050] Max accuracy: 78.72%
[20:59:35.157875] log_dir: ./output_dir_qkformer
[20:59:38.188237] Epoch: [143]  [   0/5004]  eta: 4:12:40  lr: 0.000237  loss: 2.2596 (2.2596)  time: 3.0296  data: 2.5605  max mem: 24440
[21:14:58.387266] Epoch: [143]  [2000/5004]  eta: 0:23:05  lr: 0.000233  loss: 2.2925 (2.4591)  time: 0.4619  data: 0.0003  max mem: 24440
[21:30:19.309490] Epoch: [143]  [4000/5004]  eta: 0:07:42  lr: 0.000230  loss: 2.5369 (2.4626)  time: 0.4581  data: 0.0003  max mem: 24440
[21:38:00.358534] Epoch: [143]  [5003/5004]  eta: 0:00:00  lr: 0.000229  loss: 2.5232 (2.4641)  time: 0.4573  data: 0.0006  max mem: 24440
[21:38:00.771046] Epoch: [143] Total time: 0:38:25 (0.4608 s / it)
[21:38:00.772347] Averaged stats: lr: 0.000229  loss: 2.5232 (2.4621)
[21:38:02.223471] Test:  [   0/1563]  eta: 0:37:38  loss: 0.3249 (0.3249)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4449  data: 1.2684  max mem: 24440
[21:39:26.251766] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5986 (0.6561)  acc1: 81.2500 (83.3895)  acc5: 96.8750 (97.0122)  time: 0.1679  data: 0.0002  max mem: 24440
[21:40:50.265154] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9263 (0.7704)  acc1: 65.6250 (80.4071)  acc5: 96.8750 (95.7605)  time: 0.1679  data: 0.0002  max mem: 24440
[21:42:14.300873] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3158 (0.8317)  acc1: 90.6250 (78.9744)  acc5: 96.8750 (95.0887)  time: 0.1679  data: 0.0002  max mem: 24440
[21:42:24.627969] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3592 (0.8336)  acc1: 90.6250 (78.9140)  acc5: 100.0000 (95.0960)  time: 0.1636  data: 0.0001  max mem: 24440
[21:42:24.718748] Test: Total time: 0:04:23 (0.1689 s / it)
[21:42:25.041889] * Acc@1 78.912 Acc@5 95.093 loss 0.834
[21:42:25.042033] Accuracy of the network on the 50000 test images: 78.9%
[21:42:25.042054] Max accuracy: 78.91%
[21:42:25.167869] log_dir: ./output_dir_qkformer
[21:42:27.788760] Epoch: [144]  [   0/5004]  eta: 3:38:26  lr: 0.000229  loss: 1.9066 (1.9066)  time: 2.6192  data: 2.1007  max mem: 24440
[21:57:48.158273] Epoch: [144]  [2000/5004]  eta: 0:23:05  lr: 0.000226  loss: 2.4785 (2.4486)  time: 0.4616  data: 0.0003  max mem: 24440
[22:13:19.388109] Epoch: [144]  [4000/5004]  eta: 0:07:45  lr: 0.000223  loss: 2.4642 (2.4516)  time: 0.4567  data: 0.0002  max mem: 24440
[22:21:01.054301] Epoch: [144]  [5003/5004]  eta: 0:00:00  lr: 0.000221  loss: 2.4144 (2.4539)  time: 0.4527  data: 0.0005  max mem: 24440
[22:21:01.556623] Epoch: [144] Total time: 0:38:36 (0.4629 s / it)
[22:21:01.603059] Averaged stats: lr: 0.000221  loss: 2.4144 (2.4529)
[22:21:03.892290] Test:  [   0/1563]  eta: 0:59:28  loss: 0.4655 (0.4655)  acc1: 93.7500 (93.7500)  acc5: 96.8750 (96.8750)  time: 2.2830  data: 2.0662  max mem: 24440
[22:22:27.889827] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.6254 (0.6602)  acc1: 84.3750 (83.2959)  acc5: 96.8750 (97.0247)  time: 0.1682  data: 0.0002  max mem: 24440
[22:23:51.886487] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9480 (0.7806)  acc1: 71.8750 (80.5694)  acc5: 96.8750 (95.6824)  time: 0.1678  data: 0.0002  max mem: 24440
[22:25:15.904758] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4696 (0.8407)  acc1: 87.5000 (79.1910)  acc5: 100.0000 (95.0200)  time: 0.1684  data: 0.0002  max mem: 24440
[22:25:26.233039] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3134 (0.8441)  acc1: 90.6250 (79.0880)  acc5: 100.0000 (95.0100)  time: 0.1635  data: 0.0001  max mem: 24440
[22:25:26.350465] Test: Total time: 0:04:24 (0.1694 s / it)
[22:25:26.778483] * Acc@1 79.094 Acc@5 95.012 loss 0.844
[22:25:26.778663] Accuracy of the network on the 50000 test images: 79.1%
[22:25:26.778687] Max accuracy: 79.09%
[22:25:26.879026] log_dir: ./output_dir_qkformer
[22:25:29.421803] Epoch: [145]  [   0/5004]  eta: 3:31:55  lr: 0.000221  loss: 2.0127 (2.0127)  time: 2.5411  data: 2.0378  max mem: 24440
[22:40:49.848111] Epoch: [145]  [2000/5004]  eta: 0:23:05  lr: 0.000218  loss: 2.2965 (2.4331)  time: 0.4581  data: 0.0002  max mem: 24440
[22:56:10.067778] Epoch: [145]  [4000/5004]  eta: 0:07:42  lr: 0.000215  loss: 2.3828 (2.4337)  time: 0.4569  data: 0.0002  max mem: 24440
[23:03:51.148237] Epoch: [145]  [5003/5004]  eta: 0:00:00  lr: 0.000214  loss: 2.5128 (2.4373)  time: 0.4572  data: 0.0009  max mem: 24440
[23:03:51.573152] Epoch: [145] Total time: 0:38:24 (0.4606 s / it)
[23:03:51.578566] Averaged stats: lr: 0.000214  loss: 2.5128 (2.4450)
[23:03:53.237770] Test:  [   0/1563]  eta: 0:43:05  loss: 0.2491 (0.2491)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6540  data: 1.4623  max mem: 24440
[23:05:17.238408] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6421 (0.6496)  acc1: 78.1250 (83.1213)  acc5: 96.8750 (97.1245)  time: 0.1699  data: 0.0002  max mem: 24440
[23:06:41.267067] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0046 (0.7656)  acc1: 75.0000 (80.3946)  acc5: 93.7500 (95.7137)  time: 0.1681  data: 0.0002  max mem: 24440
[23:08:05.297531] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4074 (0.8319)  acc1: 90.6250 (78.9640)  acc5: 100.0000 (94.9492)  time: 0.1679  data: 0.0002  max mem: 24440
[23:08:15.625784] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3815 (0.8343)  acc1: 90.6250 (78.9020)  acc5: 100.0000 (94.9700)  time: 0.1636  data: 0.0001  max mem: 24440
[23:08:15.717537] Test: Total time: 0:04:24 (0.1690 s / it)
[23:08:16.220986] * Acc@1 78.909 Acc@5 94.973 loss 0.834
[23:08:16.221129] Accuracy of the network on the 50000 test images: 78.9%
[23:08:16.221149] Max accuracy: 79.09%
[23:08:16.333445] log_dir: ./output_dir_qkformer
[23:08:19.058619] Epoch: [146]  [   0/5004]  eta: 3:47:06  lr: 0.000214  loss: 2.6438 (2.6438)  time: 2.7231  data: 2.2609  max mem: 24440
[23:23:40.044209] Epoch: [146]  [2000/5004]  eta: 0:23:06  lr: 0.000211  loss: 2.4264 (2.4343)  time: 0.4627  data: 0.0002  max mem: 24440
[23:38:59.900710] Epoch: [146]  [4000/5004]  eta: 0:07:42  lr: 0.000208  loss: 2.3582 (2.4313)  time: 0.4579  data: 0.0002  max mem: 24440
[23:46:40.965353] Epoch: [146]  [5003/5004]  eta: 0:00:00  lr: 0.000207  loss: 2.3086 (2.4330)  time: 0.4545  data: 0.0006  max mem: 24440
[23:46:41.423757] Epoch: [146] Total time: 0:38:25 (0.4606 s / it)
[23:46:41.455043] Averaged stats: lr: 0.000207  loss: 2.3086 (2.4360)
[23:46:44.155576] Test:  [   0/1563]  eta: 1:10:11  loss: 0.3043 (0.3043)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.6947  data: 2.3831  max mem: 24440
[23:48:08.160810] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.7252 (0.6371)  acc1: 81.2500 (82.9965)  acc5: 96.8750 (97.1432)  time: 0.1678  data: 0.0002  max mem: 24440
[23:49:32.175667] Test:  [1000/1563]  eta: 0:01:36  loss: 0.8746 (0.7579)  acc1: 78.1250 (80.4040)  acc5: 93.7500 (95.7636)  time: 0.1680  data: 0.0002  max mem: 24440
[23:50:56.174992] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3991 (0.8245)  acc1: 90.6250 (78.9307)  acc5: 100.0000 (95.0741)  time: 0.1682  data: 0.0002  max mem: 24440
[23:51:06.503838] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3629 (0.8248)  acc1: 90.6250 (78.9240)  acc5: 100.0000 (95.0960)  time: 0.1636  data: 0.0001  max mem: 24440
[23:51:06.629162] Test: Total time: 0:04:25 (0.1697 s / it)
[23:51:07.173356] * Acc@1 78.922 Acc@5 95.096 loss 0.825
[23:51:07.173502] Accuracy of the network on the 50000 test images: 78.9%
[23:51:07.173525] Max accuracy: 79.09%
[23:51:07.253527] log_dir: ./output_dir_qkformer
[23:51:10.064063] Epoch: [147]  [   0/5004]  eta: 3:54:11  lr: 0.000207  loss: 2.7759 (2.7759)  time: 2.8081  data: 2.3275  max mem: 24440
[00:06:30.742028] Epoch: [147]  [2000/5004]  eta: 0:23:06  lr: 0.000204  loss: 2.3891 (2.4220)  time: 0.4569  data: 0.0003  max mem: 24440
[00:21:49.343221] Epoch: [147]  [4000/5004]  eta: 0:07:42  lr: 0.000201  loss: 2.3553 (2.4219)  time: 0.4606  data: 0.0002  max mem: 24440
[00:29:30.290801] Epoch: [147]  [5003/5004]  eta: 0:00:00  lr: 0.000199  loss: 2.2879 (2.4250)  time: 0.4532  data: 0.0005  max mem: 24440
[00:29:30.723891] Epoch: [147] Total time: 0:38:23 (0.4603 s / it)
[00:29:30.726192] Averaged stats: lr: 0.000199  loss: 2.2879 (2.4258)
[00:29:32.475030] Test:  [   0/1563]  eta: 0:45:23  loss: 0.2392 (0.2392)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7426  data: 1.5643  max mem: 24440
[00:30:56.478255] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.6149 (0.6654)  acc1: 84.3750 (82.8406)  acc5: 96.8750 (97.1682)  time: 0.1678  data: 0.0002  max mem: 24440
[00:32:20.477146] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9199 (0.7735)  acc1: 68.7500 (80.5226)  acc5: 93.7500 (95.9540)  time: 0.1684  data: 0.0002  max mem: 24440
[00:33:44.472506] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4055 (0.8407)  acc1: 90.6250 (79.0556)  acc5: 96.8750 (95.1428)  time: 0.1682  data: 0.0002  max mem: 24440
[00:33:54.804302] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3701 (0.8447)  acc1: 93.7500 (78.9820)  acc5: 100.0000 (95.1240)  time: 0.1635  data: 0.0001  max mem: 24440
[00:33:54.916675] Test: Total time: 0:04:24 (0.1690 s / it)
[00:33:55.410572] * Acc@1 78.982 Acc@5 95.123 loss 0.845
[00:33:55.410719] Accuracy of the network on the 50000 test images: 79.0%
[00:33:55.410741] Max accuracy: 79.09%
[00:33:55.488215] log_dir: ./output_dir_qkformer
[00:33:58.013645] Epoch: [148]  [   0/5004]  eta: 3:30:25  lr: 0.000199  loss: 2.4722 (2.4722)  time: 2.5231  data: 1.9109  max mem: 24440
[00:49:20.488851] Epoch: [148]  [2000/5004]  eta: 0:23:08  lr: 0.000196  loss: 2.2285 (2.4073)  time: 0.4661  data: 0.0003  max mem: 24440
[01:04:44.042715] Epoch: [148]  [4000/5004]  eta: 0:07:43  lr: 0.000194  loss: 2.3766 (2.4129)  time: 0.4585  data: 0.0002  max mem: 24440
[01:12:26.464492] Epoch: [148]  [5003/5004]  eta: 0:00:00  lr: 0.000192  loss: 2.3892 (2.4147)  time: 0.4537  data: 0.0009  max mem: 24440
[01:12:26.904836] Epoch: [148] Total time: 0:38:31 (0.4619 s / it)
[01:12:26.953509] Averaged stats: lr: 0.000192  loss: 2.3892 (2.4182)
[01:12:29.050231] Test:  [   0/1563]  eta: 0:54:30  loss: 0.3605 (0.3605)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0927  data: 1.9168  max mem: 24440
[01:13:53.092566] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6147 (0.6323)  acc1: 84.3750 (83.4144)  acc5: 96.8750 (97.4239)  time: 0.1682  data: 0.0004  max mem: 24440
[01:15:17.151614] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8701 (0.7530)  acc1: 78.1250 (80.7349)  acc5: 96.8750 (95.9759)  time: 0.1679  data: 0.0002  max mem: 24440
[01:16:41.190479] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3763 (0.8156)  acc1: 90.6250 (79.1764)  acc5: 96.8750 (95.3219)  time: 0.1679  data: 0.0002  max mem: 24440
[01:16:51.517258] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3737 (0.8182)  acc1: 90.6250 (79.1120)  acc5: 100.0000 (95.3200)  time: 0.1636  data: 0.0001  max mem: 24440
[01:16:51.598937] Test: Total time: 0:04:24 (0.1693 s / it)
[01:16:51.986525] * Acc@1 79.112 Acc@5 95.320 loss 0.818
[01:16:51.986674] Accuracy of the network on the 50000 test images: 79.1%
[01:16:51.986696] Max accuracy: 79.11%
[01:16:52.055218] log_dir: ./output_dir_qkformer
[01:16:54.822836] Epoch: [149]  [   0/5004]  eta: 3:50:38  lr: 0.000192  loss: 2.5317 (2.5317)  time: 2.7655  data: 2.2935  max mem: 24440
[01:32:14.943716] Epoch: [149]  [2000/5004]  eta: 0:23:05  lr: 0.000189  loss: 2.3916 (2.4021)  time: 0.4599  data: 0.0002  max mem: 24440
[01:47:33.169836] Epoch: [149]  [4000/5004]  eta: 0:07:41  lr: 0.000187  loss: 2.3203 (2.4038)  time: 0.4612  data: 0.0002  max mem: 24440
[01:55:13.955567] Epoch: [149]  [5003/5004]  eta: 0:00:00  lr: 0.000185  loss: 2.3406 (2.4040)  time: 0.4530  data: 0.0009  max mem: 24440
[01:55:14.355961] Epoch: [149] Total time: 0:38:22 (0.4601 s / it)
[01:55:14.357463] Averaged stats: lr: 0.000185  loss: 2.3406 (2.4074)
[01:55:15.963108] Test:  [   0/1563]  eta: 0:41:40  loss: 0.4161 (0.4161)  acc1: 90.6250 (90.6250)  acc5: 96.8750 (96.8750)  time: 1.5996  data: 1.4230  max mem: 24440
[01:56:39.982908] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4971 (0.6195)  acc1: 84.3750 (83.6951)  acc5: 100.0000 (97.4426)  time: 0.1679  data: 0.0002  max mem: 24440
[01:58:04.057815] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9418 (0.7298)  acc1: 71.8750 (81.2906)  acc5: 93.7500 (96.1226)  time: 0.1680  data: 0.0002  max mem: 24440
[01:59:28.127187] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4959 (0.7949)  acc1: 84.3750 (79.6844)  acc5: 96.8750 (95.4010)  time: 0.1687  data: 0.0002  max mem: 24440
[01:59:38.457082] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2892 (0.7967)  acc1: 93.7500 (79.6260)  acc5: 100.0000 (95.4360)  time: 0.1636  data: 0.0001  max mem: 24440
[01:59:38.571882] Test: Total time: 0:04:24 (0.1690 s / it)
[01:59:38.735099] * Acc@1 79.624 Acc@5 95.433 loss 0.797
[01:59:38.735247] Accuracy of the network on the 50000 test images: 79.6%
[01:59:38.735268] Max accuracy: 79.62%
[01:59:38.815263] log_dir: ./output_dir_qkformer
[01:59:41.376720] Epoch: [150]  [   0/5004]  eta: 3:33:33  lr: 0.000185  loss: 2.3058 (2.3058)  time: 2.5606  data: 1.9274  max mem: 24440
[02:15:03.563409] Epoch: [150]  [2000/5004]  eta: 0:23:08  lr: 0.000182  loss: 2.2674 (2.3944)  time: 0.4604  data: 0.0002  max mem: 24440
[02:30:26.413421] Epoch: [150]  [4000/5004]  eta: 0:07:43  lr: 0.000180  loss: 2.3863 (2.3958)  time: 0.4620  data: 0.0002  max mem: 24440
[02:38:08.801333] Epoch: [150]  [5003/5004]  eta: 0:00:00  lr: 0.000178  loss: 2.2750 (2.3973)  time: 0.4532  data: 0.0008  max mem: 24440
[02:38:09.304233] Epoch: [150] Total time: 0:38:30 (0.4617 s / it)
[02:38:09.306056] Averaged stats: lr: 0.000178  loss: 2.2750 (2.3979)
[02:38:10.927118] Test:  [   0/1563]  eta: 0:42:03  loss: 0.3227 (0.3227)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6147  data: 1.4391  max mem: 24440
[02:39:34.959053] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4921 (0.6302)  acc1: 81.2500 (83.5891)  acc5: 96.8750 (97.4551)  time: 0.1679  data: 0.0002  max mem: 24440
[02:40:59.003802] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8207 (0.7449)  acc1: 78.1250 (80.9409)  acc5: 93.7500 (96.0758)  time: 0.1681  data: 0.0002  max mem: 24440
[02:42:23.062969] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4844 (0.8055)  acc1: 87.5000 (79.5428)  acc5: 100.0000 (95.3718)  time: 0.1684  data: 0.0002  max mem: 24440
[02:42:33.394187] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2780 (0.8070)  acc1: 93.7500 (79.4740)  acc5: 100.0000 (95.3780)  time: 0.1636  data: 0.0001  max mem: 24440
[02:42:33.510748] Test: Total time: 0:04:24 (0.1690 s / it)
[02:42:33.899421] * Acc@1 79.478 Acc@5 95.378 loss 0.807
[02:42:33.899569] Accuracy of the network on the 50000 test images: 79.5%
[02:42:33.899591] Max accuracy: 79.62%
[02:42:33.980871] log_dir: ./output_dir_qkformer
[02:42:36.558829] Epoch: [151]  [   0/5004]  eta: 3:34:51  lr: 0.000178  loss: 2.0725 (2.0725)  time: 2.5762  data: 2.0519  max mem: 24440
[02:57:57.018998] Epoch: [151]  [2000/5004]  eta: 0:23:05  lr: 0.000176  loss: 2.2113 (2.3813)  time: 0.4627  data: 0.0002  max mem: 24440
[03:13:17.696178] Epoch: [151]  [4000/5004]  eta: 0:07:42  lr: 0.000173  loss: 2.3614 (2.3878)  time: 0.4579  data: 0.0003  max mem: 24440
[03:20:59.165391] Epoch: [151]  [5003/5004]  eta: 0:00:00  lr: 0.000172  loss: 2.3786 (2.3868)  time: 0.4546  data: 0.0009  max mem: 24440
[03:20:59.588206] Epoch: [151] Total time: 0:38:25 (0.4608 s / it)
[03:20:59.597665] Averaged stats: lr: 0.000172  loss: 2.3786 (2.3915)
[03:21:01.721570] Test:  [   0/1563]  eta: 0:55:11  loss: 0.2622 (0.2622)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1186  data: 1.9131  max mem: 24440
[03:22:25.732145] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.6489 (0.6408)  acc1: 81.2500 (83.3895)  acc5: 96.8750 (97.3179)  time: 0.1682  data: 0.0002  max mem: 24440
[03:23:49.720147] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9229 (0.7575)  acc1: 68.7500 (80.8754)  acc5: 93.7500 (95.8698)  time: 0.1679  data: 0.0002  max mem: 24440
[03:25:13.702099] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4596 (0.8119)  acc1: 87.5000 (79.5990)  acc5: 100.0000 (95.2657)  time: 0.1681  data: 0.0002  max mem: 24440
[03:25:24.025496] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4269 (0.8156)  acc1: 90.6250 (79.5060)  acc5: 100.0000 (95.2320)  time: 0.1635  data: 0.0001  max mem: 24440
[03:25:24.144688] Test: Total time: 0:04:24 (0.1693 s / it)
[03:25:25.021495] * Acc@1 79.514 Acc@5 95.233 loss 0.816
[03:25:25.021643] Accuracy of the network on the 50000 test images: 79.5%
[03:25:25.021665] Max accuracy: 79.62%
[03:25:25.146150] log_dir: ./output_dir_qkformer
[03:25:27.742673] Epoch: [152]  [   0/5004]  eta: 3:36:22  lr: 0.000171  loss: 2.5800 (2.5800)  time: 2.5943  data: 2.0550  max mem: 24440
[03:40:49.608645] Epoch: [152]  [2000/5004]  eta: 0:23:07  lr: 0.000169  loss: 2.4048 (2.3910)  time: 0.4572  data: 0.0002  max mem: 24440
[03:56:10.059225] Epoch: [152]  [4000/5004]  eta: 0:07:42  lr: 0.000166  loss: 2.3915 (2.3893)  time: 0.4623  data: 0.0003  max mem: 24440
[04:03:51.874211] Epoch: [152]  [5003/5004]  eta: 0:00:00  lr: 0.000165  loss: 2.3058 (2.3882)  time: 0.4530  data: 0.0005  max mem: 24440
[04:03:52.358576] Epoch: [152] Total time: 0:38:27 (0.4611 s / it)
[04:03:52.359829] Averaged stats: lr: 0.000165  loss: 2.3058 (2.3850)
[04:03:54.171071] Test:  [   0/1563]  eta: 0:47:04  loss: 0.2048 (0.2048)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8072  data: 1.3773  max mem: 24440
[04:05:18.230209] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5478 (0.6352)  acc1: 84.3750 (83.8261)  acc5: 96.8750 (97.3615)  time: 0.1685  data: 0.0002  max mem: 24440
[04:06:42.261396] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9030 (0.7475)  acc1: 75.0000 (81.1688)  acc5: 96.8750 (96.0633)  time: 0.1683  data: 0.0002  max mem: 24440
[04:08:06.306331] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4381 (0.8133)  acc1: 87.5000 (79.5574)  acc5: 96.8750 (95.3240)  time: 0.1683  data: 0.0002  max mem: 24440
[04:08:16.642253] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3119 (0.8151)  acc1: 90.6250 (79.5000)  acc5: 100.0000 (95.3220)  time: 0.1636  data: 0.0001  max mem: 24440
[04:08:16.774466] Test: Total time: 0:04:24 (0.1692 s / it)
[04:08:17.302903] * Acc@1 79.495 Acc@5 95.326 loss 0.815
[04:08:17.303059] Accuracy of the network on the 50000 test images: 79.5%
[04:08:17.303082] Max accuracy: 79.62%
[04:08:17.405595] log_dir: ./output_dir_qkformer
[04:08:20.005974] Epoch: [153]  [   0/5004]  eta: 3:36:47  lr: 0.000165  loss: 2.1336 (2.1336)  time: 2.5995  data: 2.0790  max mem: 24440
[04:23:41.656833] Epoch: [153]  [2000/5004]  eta: 0:23:07  lr: 0.000162  loss: 2.3193 (2.3796)  time: 0.4633  data: 0.0002  max mem: 24440
[04:39:03.545320] Epoch: [153]  [4000/5004]  eta: 0:07:43  lr: 0.000160  loss: 2.3987 (2.3874)  time: 0.4589  data: 0.0003  max mem: 24440
[04:46:44.983725] Epoch: [153]  [5003/5004]  eta: 0:00:00  lr: 0.000158  loss: 2.5705 (2.3891)  time: 0.4539  data: 0.0009  max mem: 24440
[04:46:45.429570] Epoch: [153] Total time: 0:38:28 (0.4612 s / it)
[04:46:45.431507] Averaged stats: lr: 0.000158  loss: 2.5705 (2.3747)
[04:46:47.892123] Test:  [   0/1563]  eta: 1:03:55  loss: 0.3173 (0.3173)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.4542  data: 2.0557  max mem: 24440
[04:48:11.964353] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.5851 (0.6097)  acc1: 84.3750 (84.1879)  acc5: 96.8750 (97.4988)  time: 0.1683  data: 0.0002  max mem: 24440
[04:49:35.998336] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9051 (0.7346)  acc1: 75.0000 (81.2063)  acc5: 93.7500 (96.0321)  time: 0.1682  data: 0.0002  max mem: 24440
[04:51:00.031028] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4256 (0.7996)  acc1: 87.5000 (79.8718)  acc5: 100.0000 (95.3052)  time: 0.1679  data: 0.0002  max mem: 24440
[04:51:10.364485] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2769 (0.8015)  acc1: 93.7500 (79.7700)  acc5: 100.0000 (95.2960)  time: 0.1635  data: 0.0001  max mem: 24440
[04:51:10.499864] Test: Total time: 0:04:25 (0.1696 s / it)
[04:51:10.765235] * Acc@1 79.777 Acc@5 95.298 loss 0.801
[04:51:10.765388] Accuracy of the network on the 50000 test images: 79.8%
[04:51:10.765413] Max accuracy: 79.78%
[04:51:10.847809] log_dir: ./output_dir_qkformer
[04:51:13.511627] Epoch: [154]  [   0/5004]  eta: 3:41:55  lr: 0.000158  loss: 1.9571 (1.9571)  time: 2.6609  data: 1.9242  max mem: 24440
[05:06:35.163842] Epoch: [154]  [2000/5004]  eta: 0:23:07  lr: 0.000156  loss: 2.2396 (2.3576)  time: 0.4595  data: 0.0002  max mem: 24440
[05:21:56.381002] Epoch: [154]  [4000/5004]  eta: 0:07:43  lr: 0.000153  loss: 2.3105 (2.3641)  time: 0.4625  data: 0.0003  max mem: 24440
[05:29:38.889067] Epoch: [154]  [5003/5004]  eta: 0:00:00  lr: 0.000152  loss: 2.4863 (2.3675)  time: 0.4530  data: 0.0009  max mem: 24440
[05:29:39.335002] Epoch: [154] Total time: 0:38:28 (0.4613 s / it)
[05:29:39.336536] Averaged stats: lr: 0.000152  loss: 2.4863 (2.3663)
[05:29:41.022805] Test:  [   0/1563]  eta: 0:43:45  loss: 0.3050 (0.3050)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6799  data: 1.5039  max mem: 24440
[05:31:05.089012] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4936 (0.6150)  acc1: 84.3750 (84.0506)  acc5: 96.8750 (97.3802)  time: 0.1683  data: 0.0002  max mem: 24440
[05:32:29.114431] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9846 (0.7332)  acc1: 75.0000 (81.2843)  acc5: 93.7500 (96.1101)  time: 0.1679  data: 0.0002  max mem: 24440
[05:33:53.176555] Test:  [1500/1563]  eta: 0:00:10  loss: 0.5268 (0.7962)  acc1: 87.5000 (79.8718)  acc5: 100.0000 (95.4218)  time: 0.1680  data: 0.0002  max mem: 24440
[05:34:03.507317] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3310 (0.7991)  acc1: 90.6250 (79.7880)  acc5: 96.8750 (95.3780)  time: 0.1636  data: 0.0001  max mem: 24440
[05:34:03.635023] Test: Total time: 0:04:24 (0.1691 s / it)
[05:34:03.975505] * Acc@1 79.795 Acc@5 95.381 loss 0.799
[05:34:03.975646] Accuracy of the network on the 50000 test images: 79.8%
[05:34:03.975666] Max accuracy: 79.79%
[05:34:04.005665] log_dir: ./output_dir_qkformer
[05:34:06.754450] Epoch: [155]  [   0/5004]  eta: 3:49:05  lr: 0.000152  loss: 2.1814 (2.1814)  time: 2.7470  data: 2.2960  max mem: 24440
[05:49:27.954442] Epoch: [155]  [2000/5004]  eta: 0:23:06  lr: 0.000149  loss: 2.3882 (2.3488)  time: 0.4623  data: 0.0003  max mem: 24440
[06:04:50.719571] Epoch: [155]  [4000/5004]  eta: 0:07:43  lr: 0.000147  loss: 2.2509 (2.3581)  time: 0.4642  data: 0.0002  max mem: 24440
[06:12:32.418947] Epoch: [155]  [5003/5004]  eta: 0:00:00  lr: 0.000145  loss: 2.3254 (2.3600)  time: 0.4530  data: 0.0006  max mem: 24440
[06:12:32.856092] Epoch: [155] Total time: 0:38:28 (0.4614 s / it)
[06:12:32.866800] Averaged stats: lr: 0.000145  loss: 2.3254 (2.3550)
[06:12:34.747016] Test:  [   0/1563]  eta: 0:48:51  loss: 0.2710 (0.2710)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8756  data: 1.7018  max mem: 24440
[06:13:58.774582] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5005 (0.6120)  acc1: 84.3750 (84.1567)  acc5: 96.8750 (97.2056)  time: 0.1680  data: 0.0002  max mem: 24440
[06:15:22.774283] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9619 (0.7237)  acc1: 78.1250 (81.5216)  acc5: 93.7500 (95.9384)  time: 0.1678  data: 0.0002  max mem: 24440
[06:16:46.816603] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4148 (0.7872)  acc1: 90.6250 (80.0321)  acc5: 100.0000 (95.3385)  time: 0.1678  data: 0.0002  max mem: 24440
[06:16:57.145100] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3290 (0.7893)  acc1: 90.6250 (79.9440)  acc5: 100.0000 (95.3580)  time: 0.1635  data: 0.0001  max mem: 24440
[06:16:57.271229] Test: Total time: 0:04:24 (0.1692 s / it)
[06:16:57.861082] * Acc@1 79.942 Acc@5 95.363 loss 0.789
[06:16:57.861246] Accuracy of the network on the 50000 test images: 79.9%
[06:16:57.861271] Max accuracy: 79.94%
[06:16:57.963963] log_dir: ./output_dir_qkformer
[06:17:00.650207] Epoch: [156]  [   0/5004]  eta: 3:43:51  lr: 0.000145  loss: 2.5718 (2.5718)  time: 2.6842  data: 2.1619  max mem: 24440
[06:32:22.375281] Epoch: [156]  [2000/5004]  eta: 0:23:07  lr: 0.000143  loss: 2.4497 (2.3516)  time: 0.4628  data: 0.0002  max mem: 24440
[06:47:43.847310] Epoch: [156]  [4000/5004]  eta: 0:07:43  lr: 0.000140  loss: 2.2683 (2.3535)  time: 0.4595  data: 0.0003  max mem: 24440
[06:55:25.210060] Epoch: [156]  [5003/5004]  eta: 0:00:00  lr: 0.000139  loss: 2.2299 (2.3535)  time: 0.4533  data: 0.0009  max mem: 24440
[06:55:25.630645] Epoch: [156] Total time: 0:38:27 (0.4612 s / it)
[06:55:25.639748] Averaged stats: lr: 0.000139  loss: 2.2299 (2.3499)
[06:55:27.087373] Test:  [   0/1563]  eta: 0:37:34  loss: 0.2539 (0.2539)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4421  data: 1.2255  max mem: 24440
[06:56:51.256352] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5448 (0.6121)  acc1: 84.3750 (84.2814)  acc5: 96.8750 (97.5050)  time: 0.1679  data: 0.0002  max mem: 24440
[06:58:15.266273] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8871 (0.7216)  acc1: 75.0000 (81.5403)  acc5: 93.7500 (96.2319)  time: 0.1683  data: 0.0005  max mem: 24440
[06:59:39.236913] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4897 (0.7834)  acc1: 87.5000 (80.1362)  acc5: 100.0000 (95.5134)  time: 0.1678  data: 0.0002  max mem: 24440
[06:59:49.567103] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2920 (0.7865)  acc1: 93.7500 (80.0840)  acc5: 100.0000 (95.4820)  time: 0.1635  data: 0.0001  max mem: 24440
[06:59:49.662272] Test: Total time: 0:04:24 (0.1689 s / it)
[06:59:50.408133] * Acc@1 80.093 Acc@5 95.483 loss 0.786
[06:59:50.408279] Accuracy of the network on the 50000 test images: 80.1%
[06:59:50.408300] Max accuracy: 80.09%
[06:59:50.490657] log_dir: ./output_dir_qkformer
[06:59:53.029156] Epoch: [157]  [   0/5004]  eta: 3:31:32  lr: 0.000139  loss: 2.6146 (2.6146)  time: 2.5364  data: 2.0528  max mem: 24440
[07:15:13.479975] Epoch: [157]  [2000/5004]  eta: 0:23:05  lr: 0.000137  loss: 2.3986 (2.3368)  time: 0.4601  data: 0.0002  max mem: 24440
[07:30:34.011580] Epoch: [157]  [4000/5004]  eta: 0:07:42  lr: 0.000134  loss: 2.3804 (2.3381)  time: 0.4624  data: 0.0002  max mem: 24440
[07:38:16.001354] Epoch: [157]  [5003/5004]  eta: 0:00:00  lr: 0.000133  loss: 2.4401 (2.3396)  time: 0.4530  data: 0.0009  max mem: 24440
[07:38:16.409411] Epoch: [157] Total time: 0:38:25 (0.4608 s / it)
[07:38:16.411008] Averaged stats: lr: 0.000133  loss: 2.4401 (2.3399)
[07:38:18.112374] Test:  [   0/1563]  eta: 0:44:09  loss: 0.2806 (0.2806)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6950  data: 1.5121  max mem: 24440
[07:39:42.079063] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.7129 (0.6039)  acc1: 81.2500 (83.8947)  acc5: 96.8750 (97.4114)  time: 0.1680  data: 0.0002  max mem: 24440
[07:41:06.068135] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9549 (0.7179)  acc1: 71.8750 (81.4560)  acc5: 93.7500 (96.0727)  time: 0.1681  data: 0.0002  max mem: 24440
[07:42:30.046109] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4810 (0.7843)  acc1: 87.5000 (79.9529)  acc5: 96.8750 (95.4114)  time: 0.1682  data: 0.0002  max mem: 24440
[07:42:40.375680] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3783 (0.7876)  acc1: 90.6250 (79.8660)  acc5: 100.0000 (95.4000)  time: 0.1635  data: 0.0001  max mem: 24440
[07:42:40.498771] Test: Total time: 0:04:24 (0.1690 s / it)
[07:42:40.931355] * Acc@1 79.868 Acc@5 95.399 loss 0.788
[07:42:40.931513] Accuracy of the network on the 50000 test images: 79.9%
[07:42:40.931538] Max accuracy: 80.09%
[07:42:40.995763] log_dir: ./output_dir_qkformer
[07:42:44.320874] Epoch: [158]  [   0/5004]  eta: 4:37:09  lr: 0.000133  loss: 1.8795 (1.8795)  time: 3.3232  data: 2.8532  max mem: 24440
[07:58:04.138939] Epoch: [158]  [2000/5004]  eta: 0:23:05  lr: 0.000131  loss: 2.4498 (2.3299)  time: 0.4602  data: 0.0002  max mem: 24440
[08:13:23.958740] Epoch: [158]  [4000/5004]  eta: 0:07:42  lr: 0.000128  loss: 2.3019 (2.3348)  time: 0.4588  data: 0.0002  max mem: 24440
[08:21:05.110383] Epoch: [158]  [5003/5004]  eta: 0:00:00  lr: 0.000127  loss: 2.2697 (2.3352)  time: 0.4532  data: 0.0005  max mem: 24440
[08:21:05.533279] Epoch: [158] Total time: 0:38:24 (0.4605 s / it)
[08:21:05.539605] Averaged stats: lr: 0.000127  loss: 2.2697 (2.3326)
[08:21:07.273572] Test:  [   0/1563]  eta: 0:45:01  loss: 0.2581 (0.2581)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7282  data: 1.5535  max mem: 24440
[08:22:31.293315] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5532 (0.6186)  acc1: 84.3750 (84.3500)  acc5: 100.0000 (97.4800)  time: 0.1680  data: 0.0002  max mem: 24440
[08:23:55.304111] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8576 (0.7263)  acc1: 75.0000 (81.6121)  acc5: 96.8750 (96.2038)  time: 0.1682  data: 0.0002  max mem: 24440
[08:25:19.330005] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4856 (0.7877)  acc1: 87.5000 (80.0799)  acc5: 100.0000 (95.5446)  time: 0.1681  data: 0.0002  max mem: 24440
[08:25:29.652844] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4076 (0.7899)  acc1: 90.6250 (79.9940)  acc5: 100.0000 (95.5460)  time: 0.1636  data: 0.0001  max mem: 24440
[08:25:29.757278] Test: Total time: 0:04:24 (0.1690 s / it)
[08:25:30.514596] * Acc@1 79.993 Acc@5 95.544 loss 0.790
[08:25:30.514753] Accuracy of the network on the 50000 test images: 80.0%
[08:25:30.514775] Max accuracy: 80.09%
[08:25:30.591994] log_dir: ./output_dir_qkformer
[08:25:33.359448] Epoch: [159]  [   0/5004]  eta: 3:50:31  lr: 0.000127  loss: 2.2815 (2.2815)  time: 2.7640  data: 2.1845  max mem: 24440
[08:40:54.596494] Epoch: [159]  [2000/5004]  eta: 0:23:07  lr: 0.000125  loss: 2.2957 (2.3101)  time: 0.4575  data: 0.0003  max mem: 24440
[08:56:15.061501] Epoch: [159]  [4000/5004]  eta: 0:07:42  lr: 0.000122  loss: 2.2849 (2.3223)  time: 0.4566  data: 0.0003  max mem: 24440
[09:03:56.436763] Epoch: [159]  [5003/5004]  eta: 0:00:00  lr: 0.000121  loss: 2.2507 (2.3262)  time: 0.4537  data: 0.0009  max mem: 24440
[09:03:56.855552] Epoch: [159] Total time: 0:38:26 (0.4609 s / it)
[09:03:56.864199] Averaged stats: lr: 0.000121  loss: 2.2507 (2.3228)
[09:03:58.767900] Test:  [   0/1563]  eta: 0:49:29  loss: 0.2857 (0.2857)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.9001  data: 1.6827  max mem: 24440
[09:05:22.718835] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5529 (0.5996)  acc1: 84.3750 (84.3251)  acc5: 100.0000 (97.6235)  time: 0.1678  data: 0.0002  max mem: 24440
[09:06:46.711012] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9172 (0.7144)  acc1: 71.8750 (81.5622)  acc5: 96.8750 (96.3006)  time: 0.1681  data: 0.0002  max mem: 24440
[09:08:10.715586] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4189 (0.7721)  acc1: 87.5000 (80.2361)  acc5: 100.0000 (95.6779)  time: 0.1678  data: 0.0002  max mem: 24440
[09:08:21.041075] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3554 (0.7750)  acc1: 90.6250 (80.1520)  acc5: 100.0000 (95.6680)  time: 0.1635  data: 0.0001  max mem: 24440
[09:08:21.157804] Test: Total time: 0:04:24 (0.1691 s / it)
[09:08:21.916545] * Acc@1 80.143 Acc@5 95.666 loss 0.775
[09:08:21.916692] Accuracy of the network on the 50000 test images: 80.1%
[09:08:21.916714] Max accuracy: 80.14%
[09:08:21.989662] log_dir: ./output_dir_qkformer
[09:08:24.756045] Epoch: [160]  [   0/5004]  eta: 3:50:37  lr: 0.000121  loss: 2.0811 (2.0811)  time: 2.7654  data: 2.1116  max mem: 24440
[09:23:45.596405] Epoch: [160]  [2000/5004]  eta: 0:23:06  lr: 0.000119  loss: 2.3178 (2.3058)  time: 0.4570  data: 0.0002  max mem: 24440
[09:39:06.030170] Epoch: [160]  [4000/5004]  eta: 0:07:42  lr: 0.000117  loss: 2.3843 (2.3131)  time: 0.4606  data: 0.0003  max mem: 24440
[09:46:47.969354] Epoch: [160]  [5003/5004]  eta: 0:00:00  lr: 0.000115  loss: 2.3188 (2.3168)  time: 0.4538  data: 0.0009  max mem: 24440
[09:46:48.384055] Epoch: [160] Total time: 0:38:26 (0.4609 s / it)
[09:46:48.389540] Averaged stats: lr: 0.000115  loss: 2.3188 (2.3153)
[09:46:50.037454] Test:  [   0/1563]  eta: 0:42:47  loss: 0.2356 (0.2356)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6424  data: 1.4633  max mem: 24440
[09:48:14.041156] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4786 (0.6011)  acc1: 84.3750 (84.3438)  acc5: 96.8750 (97.5299)  time: 0.1680  data: 0.0002  max mem: 24440
[09:49:38.044222] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8423 (0.7054)  acc1: 75.0000 (81.8869)  acc5: 96.8750 (96.2506)  time: 0.1679  data: 0.0002  max mem: 24440
[09:51:02.087551] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4265 (0.7646)  acc1: 87.5000 (80.2944)  acc5: 96.8750 (95.6591)  time: 0.1681  data: 0.0002  max mem: 24440
[09:51:12.417016] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3808 (0.7675)  acc1: 90.6250 (80.2180)  acc5: 100.0000 (95.6660)  time: 0.1636  data: 0.0001  max mem: 24440
[09:51:12.528972] Test: Total time: 0:04:24 (0.1690 s / it)
[09:51:13.047614] * Acc@1 80.219 Acc@5 95.666 loss 0.767
[09:51:13.047769] Accuracy of the network on the 50000 test images: 80.2%
[09:51:13.047790] Max accuracy: 80.22%
[09:51:13.114614] log_dir: ./output_dir_qkformer
[09:51:15.618561] Epoch: [161]  [   0/5004]  eta: 3:28:38  lr: 0.000115  loss: 2.2932 (2.2932)  time: 2.5018  data: 1.9146  max mem: 24440
[10:06:38.431009] Epoch: [161]  [2000/5004]  eta: 0:23:09  lr: 0.000113  loss: 2.2954 (2.3024)  time: 0.4637  data: 0.0003  max mem: 24440
[10:22:00.335602] Epoch: [161]  [4000/5004]  eta: 0:07:43  lr: 0.000111  loss: 2.3072 (2.2970)  time: 0.4588  data: 0.0002  max mem: 24440
[10:29:42.625199] Epoch: [161]  [5003/5004]  eta: 0:00:00  lr: 0.000110  loss: 2.1338 (2.2960)  time: 0.4539  data: 0.0005  max mem: 24440
[10:29:43.080492] Epoch: [161] Total time: 0:38:29 (0.4616 s / it)
[10:29:43.084410] Averaged stats: lr: 0.000110  loss: 2.1338 (2.3049)
[10:29:45.278395] Test:  [   0/1563]  eta: 0:57:00  loss: 0.3574 (0.3574)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1881  data: 2.0129  max mem: 24440
[10:31:09.271823] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5842 (0.5989)  acc1: 87.5000 (84.3937)  acc5: 96.8750 (97.5237)  time: 0.1681  data: 0.0002  max mem: 24440
[10:32:33.301183] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8425 (0.6984)  acc1: 78.1250 (81.9930)  acc5: 96.8750 (96.3287)  time: 0.1681  data: 0.0002  max mem: 24440
[10:33:57.337456] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4047 (0.7594)  acc1: 87.5000 (80.5213)  acc5: 100.0000 (95.6841)  time: 0.1681  data: 0.0002  max mem: 24440
[10:34:07.680500] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3273 (0.7612)  acc1: 93.7500 (80.4260)  acc5: 100.0000 (95.7060)  time: 0.1636  data: 0.0001  max mem: 24440
[10:34:07.779053] Test: Total time: 0:04:24 (0.1693 s / it)
[10:34:08.433965] * Acc@1 80.421 Acc@5 95.708 loss 0.761
[10:34:08.434115] Accuracy of the network on the 50000 test images: 80.4%
[10:34:08.434137] Max accuracy: 80.42%
[10:34:08.507823] log_dir: ./output_dir_qkformer
[10:34:11.229263] Epoch: [162]  [   0/5004]  eta: 3:46:48  lr: 0.000110  loss: 2.5795 (2.5795)  time: 2.7196  data: 2.0836  max mem: 24440
[10:49:32.366814] Epoch: [162]  [2000/5004]  eta: 0:23:06  lr: 0.000108  loss: 2.2443 (2.2929)  time: 0.4580  data: 0.0002  max mem: 24440
[11:04:53.128307] Epoch: [162]  [4000/5004]  eta: 0:07:42  lr: 0.000105  loss: 2.2934 (2.2930)  time: 0.4600  data: 0.0003  max mem: 24440
[11:12:33.682122] Epoch: [162]  [5003/5004]  eta: 0:00:00  lr: 0.000104  loss: 2.2332 (2.2966)  time: 0.4530  data: 0.0005  max mem: 24440
[11:12:34.057572] Epoch: [162] Total time: 0:38:25 (0.4607 s / it)
[11:12:34.063317] Averaged stats: lr: 0.000104  loss: 2.2332 (2.2971)
[11:12:35.540261] Test:  [   0/1563]  eta: 0:38:19  loss: 0.3273 (0.3273)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4710  data: 1.2957  max mem: 24440
[11:13:59.796541] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5550 (0.6012)  acc1: 84.3750 (84.2627)  acc5: 96.8750 (97.5424)  time: 0.1678  data: 0.0002  max mem: 24440
[11:15:23.862173] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0195 (0.7237)  acc1: 71.8750 (81.7838)  acc5: 96.8750 (96.1976)  time: 0.1680  data: 0.0002  max mem: 24440
[11:16:47.902013] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3492 (0.7791)  acc1: 90.6250 (80.5130)  acc5: 100.0000 (95.6654)  time: 0.1679  data: 0.0002  max mem: 24440
[11:16:58.227698] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2576 (0.7801)  acc1: 90.6250 (80.4680)  acc5: 100.0000 (95.6640)  time: 0.1636  data: 0.0001  max mem: 24440
[11:16:58.358016] Test: Total time: 0:04:24 (0.1691 s / it)
[11:16:58.788136] * Acc@1 80.462 Acc@5 95.665 loss 0.780
[11:16:58.788300] Accuracy of the network on the 50000 test images: 80.5%
[11:16:58.788321] Max accuracy: 80.46%
[11:16:58.917952] log_dir: ./output_dir_qkformer
[11:17:01.542759] Epoch: [163]  [   0/5004]  eta: 3:38:47  lr: 0.000104  loss: 2.1683 (2.1683)  time: 2.6233  data: 2.1433  max mem: 24440
[11:32:24.071717] Epoch: [163]  [2000/5004]  eta: 0:23:08  lr: 0.000102  loss: 2.2720 (2.2830)  time: 0.4642  data: 0.0003  max mem: 24440
[11:47:45.211590] Epoch: [163]  [4000/5004]  eta: 0:07:43  lr: 0.000100  loss: 2.3138 (2.2791)  time: 0.4569  data: 0.0002  max mem: 24440
[11:55:26.854174] Epoch: [163]  [5003/5004]  eta: 0:00:00  lr: 0.000099  loss: 2.0887 (2.2838)  time: 0.4580  data: 0.0006  max mem: 24440
[11:55:27.344648] Epoch: [163] Total time: 0:38:28 (0.4613 s / it)
[11:55:27.353867] Averaged stats: lr: 0.000099  loss: 2.0887 (2.2892)
[11:55:29.664348] Test:  [   0/1563]  eta: 1:00:02  loss: 0.3826 (0.3826)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.3050  data: 2.1189  max mem: 24440
[11:56:53.665732] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.4812 (0.5976)  acc1: 84.3750 (84.6432)  acc5: 96.8750 (97.5861)  time: 0.1682  data: 0.0002  max mem: 24440
[11:58:17.640520] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8659 (0.7066)  acc1: 75.0000 (82.2709)  acc5: 96.8750 (96.3287)  time: 0.1678  data: 0.0002  max mem: 24440
[11:59:41.606036] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4590 (0.7616)  acc1: 87.5000 (80.8690)  acc5: 100.0000 (95.7133)  time: 0.1678  data: 0.0002  max mem: 24440
[11:59:51.935116] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4201 (0.7655)  acc1: 90.6250 (80.7620)  acc5: 100.0000 (95.7000)  time: 0.1635  data: 0.0001  max mem: 24440
[11:59:52.020551] Test: Total time: 0:04:24 (0.1693 s / it)
[11:59:52.720409] * Acc@1 80.761 Acc@5 95.697 loss 0.765
[11:59:52.720640] Accuracy of the network on the 50000 test images: 80.8%
[11:59:52.720677] Max accuracy: 80.76%
[11:59:52.824425] log_dir: ./output_dir_qkformer
[11:59:55.517755] Epoch: [164]  [   0/5004]  eta: 3:44:30  lr: 0.000099  loss: 2.4331 (2.4331)  time: 2.6920  data: 1.9554  max mem: 24440
[12:15:16.464862] Epoch: [164]  [2000/5004]  eta: 0:23:06  lr: 0.000097  loss: 2.3221 (2.2815)  time: 0.4604  data: 0.0002  max mem: 24440
[12:30:35.585448] Epoch: [164]  [4000/5004]  eta: 0:07:42  lr: 0.000095  loss: 2.2619 (2.2789)  time: 0.4571  data: 0.0002  max mem: 24440
[12:38:16.219561] Epoch: [164]  [5003/5004]  eta: 0:00:00  lr: 0.000094  loss: 2.3429 (2.2788)  time: 0.4531  data: 0.0009  max mem: 24440
[12:38:16.622841] Epoch: [164] Total time: 0:38:23 (0.4604 s / it)
[12:38:16.628014] Averaged stats: lr: 0.000094  loss: 2.3429 (2.2819)
[12:38:18.218723] Test:  [   0/1563]  eta: 0:41:19  loss: 0.2884 (0.2884)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5862  data: 1.4080  max mem: 24440
[12:39:42.259876] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5045 (0.5898)  acc1: 87.5000 (84.8491)  acc5: 96.8750 (97.6110)  time: 0.1678  data: 0.0002  max mem: 24440
[12:41:06.284256] Test:  [1000/1563]  eta: 0:01:35  loss: 1.0464 (0.7071)  acc1: 71.8750 (82.0180)  acc5: 96.8750 (96.4348)  time: 0.1682  data: 0.0002  max mem: 24440
[12:42:30.266022] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3549 (0.7636)  acc1: 93.7500 (80.5942)  acc5: 100.0000 (95.8132)  time: 0.1678  data: 0.0002  max mem: 24440
[12:42:40.592557] Test:  [1562/1563]  eta: 0:00:00  loss: 0.4358 (0.7674)  acc1: 93.7500 (80.4960)  acc5: 100.0000 (95.7880)  time: 0.1635  data: 0.0001  max mem: 24440
[12:42:40.695819] Test: Total time: 0:04:24 (0.1689 s / it)
[12:42:41.110376] * Acc@1 80.491 Acc@5 95.790 loss 0.767
[12:42:41.110522] Accuracy of the network on the 50000 test images: 80.5%
[12:42:41.110542] Max accuracy: 80.76%
[12:42:41.219496] log_dir: ./output_dir_qkformer
[12:42:43.910546] Epoch: [165]  [   0/5004]  eta: 3:44:22  lr: 0.000094  loss: 2.4335 (2.4335)  time: 2.6904  data: 2.2067  max mem: 24440
[12:58:04.788676] Epoch: [165]  [2000/5004]  eta: 0:23:06  lr: 0.000092  loss: 2.1694 (2.2618)  time: 0.4802  data: 0.0002  max mem: 24440
[13:13:24.556132] Epoch: [165]  [4000/5004]  eta: 0:07:42  lr: 0.000090  loss: 2.2523 (2.2660)  time: 0.4591  data: 0.0002  max mem: 24440
[13:21:05.337449] Epoch: [165]  [5003/5004]  eta: 0:00:00  lr: 0.000089  loss: 2.2825 (2.2719)  time: 0.4532  data: 0.0006  max mem: 24440
[13:21:05.767307] Epoch: [165] Total time: 0:38:24 (0.4605 s / it)
[13:21:05.771571] Averaged stats: lr: 0.000089  loss: 2.2825 (2.2714)
[13:21:07.458320] Test:  [   0/1563]  eta: 0:43:46  loss: 0.2554 (0.2554)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6805  data: 1.4916  max mem: 24440
[13:22:31.519059] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5884 (0.5889)  acc1: 87.5000 (84.6744)  acc5: 96.8750 (97.6297)  time: 0.1684  data: 0.0005  max mem: 24440
[13:23:55.584271] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8654 (0.7060)  acc1: 75.0000 (82.0367)  acc5: 96.8750 (96.2975)  time: 0.1679  data: 0.0002  max mem: 24440
[13:25:19.653811] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3974 (0.7591)  acc1: 87.5000 (80.9315)  acc5: 100.0000 (95.8278)  time: 0.1680  data: 0.0002  max mem: 24440
[13:25:29.985901] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3856 (0.7626)  acc1: 93.7500 (80.8200)  acc5: 100.0000 (95.8160)  time: 0.1636  data: 0.0001  max mem: 24440
[13:25:30.098827] Test: Total time: 0:04:24 (0.1691 s / it)
[13:25:30.589902] * Acc@1 80.815 Acc@5 95.820 loss 0.763
[13:25:30.590190] Accuracy of the network on the 50000 test images: 80.8%
[13:25:30.590216] Max accuracy: 80.81%
[13:25:30.681560] log_dir: ./output_dir_qkformer
[13:25:33.510899] Epoch: [166]  [   0/5004]  eta: 3:55:49  lr: 0.000089  loss: 2.2178 (2.2178)  time: 2.8275  data: 2.1138  max mem: 24440
[13:40:55.224607] Epoch: [166]  [2000/5004]  eta: 0:23:07  lr: 0.000087  loss: 2.2702 (2.2593)  time: 0.4591  data: 0.0003  max mem: 24440
[13:56:15.995130] Epoch: [166]  [4000/5004]  eta: 0:07:43  lr: 0.000085  loss: 2.1436 (2.2588)  time: 0.4620  data: 0.0003  max mem: 24440
[14:03:57.843984] Epoch: [166]  [5003/5004]  eta: 0:00:00  lr: 0.000084  loss: 2.1640 (2.2613)  time: 0.4535  data: 0.0009  max mem: 24440
[14:03:58.246991] Epoch: [166] Total time: 0:38:27 (0.4611 s / it)
[14:03:58.251495] Averaged stats: lr: 0.000084  loss: 2.1640 (2.2609)
[14:03:59.820615] Test:  [   0/1563]  eta: 0:40:46  loss: 0.2330 (0.2330)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5649  data: 1.3722  max mem: 24440
[14:05:23.862234] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5684 (0.5923)  acc1: 87.5000 (84.8179)  acc5: 96.8750 (97.6485)  time: 0.1681  data: 0.0002  max mem: 24440
[14:06:47.917406] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7725 (0.7038)  acc1: 78.1250 (82.2178)  acc5: 96.8750 (96.4067)  time: 0.1680  data: 0.0002  max mem: 24440
[14:08:11.973147] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4110 (0.7640)  acc1: 90.6250 (80.8565)  acc5: 100.0000 (95.8195)  time: 0.1679  data: 0.0002  max mem: 24440
[14:08:22.297582] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3234 (0.7653)  acc1: 90.6250 (80.8040)  acc5: 100.0000 (95.8360)  time: 0.1636  data: 0.0001  max mem: 24440
[14:08:22.422421] Test: Total time: 0:04:24 (0.1690 s / it)
[14:08:22.660908] * Acc@1 80.802 Acc@5 95.835 loss 0.765
[14:08:22.661061] Accuracy of the network on the 50000 test images: 80.8%
[14:08:22.661084] Max accuracy: 80.81%
[14:08:22.765459] log_dir: ./output_dir_qkformer
[14:08:25.488086] Epoch: [167]  [   0/5004]  eta: 3:46:54  lr: 0.000084  loss: 1.7419 (1.7419)  time: 2.7208  data: 2.2443  max mem: 24440
[14:23:46.209764] Epoch: [167]  [2000/5004]  eta: 0:23:06  lr: 0.000082  loss: 2.0368 (2.2471)  time: 0.4638  data: 0.0003  max mem: 24440
[14:39:07.475936] Epoch: [167]  [4000/5004]  eta: 0:07:42  lr: 0.000080  loss: 2.2022 (2.2518)  time: 0.4601  data: 0.0002  max mem: 24440
[14:46:48.999881] Epoch: [167]  [5003/5004]  eta: 0:00:00  lr: 0.000079  loss: 2.2497 (2.2551)  time: 0.4539  data: 0.0009  max mem: 24440
[14:46:49.363743] Epoch: [167] Total time: 0:38:26 (0.4610 s / it)
[14:46:49.366028] Averaged stats: lr: 0.000079  loss: 2.2497 (2.2559)
[14:46:51.137566] Test:  [   0/1563]  eta: 0:46:01  loss: 0.2953 (0.2953)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7670  data: 1.3471  max mem: 24440
[14:48:15.162669] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4904 (0.5855)  acc1: 87.5000 (84.6307)  acc5: 96.8750 (97.6422)  time: 0.1681  data: 0.0002  max mem: 24440
[14:49:39.202313] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8338 (0.6921)  acc1: 78.1250 (82.2927)  acc5: 96.8750 (96.4723)  time: 0.1684  data: 0.0002  max mem: 24440
[14:51:03.267347] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4280 (0.7524)  acc1: 87.5000 (80.9606)  acc5: 100.0000 (95.8361)  time: 0.1683  data: 0.0002  max mem: 24440
[14:51:13.594994] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3004 (0.7541)  acc1: 90.6250 (80.8860)  acc5: 100.0000 (95.8480)  time: 0.1636  data: 0.0001  max mem: 24440
[14:51:13.721714] Test: Total time: 0:04:24 (0.1691 s / it)
[14:51:14.037292] * Acc@1 80.879 Acc@5 95.850 loss 0.754
[14:51:14.037449] Accuracy of the network on the 50000 test images: 80.9%
[14:51:14.037470] Max accuracy: 80.88%
[14:51:14.116723] log_dir: ./output_dir_qkformer
[14:51:16.943682] Epoch: [168]  [   0/5004]  eta: 3:55:30  lr: 0.000079  loss: 2.4753 (2.4753)  time: 2.8238  data: 2.1060  max mem: 24440
[15:06:39.643973] Epoch: [168]  [2000/5004]  eta: 0:23:09  lr: 0.000077  loss: 2.2552 (2.2434)  time: 0.4645  data: 0.0003  max mem: 24440
[15:22:02.447748] Epoch: [168]  [4000/5004]  eta: 0:07:43  lr: 0.000075  loss: 2.3098 (2.2461)  time: 0.4579  data: 0.0002  max mem: 24440
[15:29:44.531683] Epoch: [168]  [5003/5004]  eta: 0:00:00  lr: 0.000074  loss: 2.1423 (2.2445)  time: 0.4585  data: 0.0009  max mem: 24440
[15:29:44.955860] Epoch: [168] Total time: 0:38:30 (0.4618 s / it)
[15:29:45.048797] Averaged stats: lr: 0.000074  loss: 2.1423 (2.2481)
[15:29:47.609742] Test:  [   0/1563]  eta: 1:06:36  loss: 0.2974 (0.2974)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.5568  data: 2.2095  max mem: 24440
[15:31:11.619586] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.5613 (0.5868)  acc1: 84.3750 (84.8303)  acc5: 96.8750 (97.5736)  time: 0.1678  data: 0.0002  max mem: 24440
[15:32:35.634837] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8817 (0.6889)  acc1: 71.8750 (82.5019)  acc5: 96.8750 (96.4692)  time: 0.1679  data: 0.0002  max mem: 24440
[15:33:59.653884] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3489 (0.7466)  acc1: 90.6250 (81.1397)  acc5: 100.0000 (95.9756)  time: 0.1679  data: 0.0002  max mem: 24440
[15:34:09.980302] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3136 (0.7487)  acc1: 90.6250 (81.0580)  acc5: 100.0000 (95.9800)  time: 0.1636  data: 0.0001  max mem: 24440
[15:34:10.076657] Test: Total time: 0:04:25 (0.1696 s / it)
[15:34:10.362847] * Acc@1 81.057 Acc@5 95.983 loss 0.749
[15:34:10.362987] Accuracy of the network on the 50000 test images: 81.1%
[15:34:10.363009] Max accuracy: 81.06%
[15:34:10.431636] log_dir: ./output_dir_qkformer
[15:34:13.184787] Epoch: [169]  [   0/5004]  eta: 3:49:32  lr: 0.000074  loss: 2.3086 (2.3086)  time: 2.7523  data: 1.9915  max mem: 24440
[15:49:34.852538] Epoch: [169]  [2000/5004]  eta: 0:23:07  lr: 0.000072  loss: 2.3708 (2.2343)  time: 0.4583  data: 0.0002  max mem: 24440
[16:04:56.224406] Epoch: [169]  [4000/5004]  eta: 0:07:43  lr: 0.000071  loss: 2.1475 (2.2391)  time: 0.4606  data: 0.0003  max mem: 24440
[16:12:38.110314] Epoch: [169]  [5003/5004]  eta: 0:00:00  lr: 0.000070  loss: 2.2798 (2.2394)  time: 0.4533  data: 0.0008  max mem: 24440
[16:12:38.600022] Epoch: [169] Total time: 0:38:28 (0.4613 s / it)
[16:12:38.604714] Averaged stats: lr: 0.000070  loss: 2.2798 (2.2392)
[16:12:40.546132] Test:  [   0/1563]  eta: 0:50:27  loss: 0.3414 (0.3414)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.9370  data: 1.7492  max mem: 24440
[16:14:04.677988] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4837 (0.5877)  acc1: 87.5000 (84.9426)  acc5: 100.0000 (97.6173)  time: 0.1679  data: 0.0002  max mem: 24440
[16:15:28.730762] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8084 (0.6939)  acc1: 75.0000 (82.5081)  acc5: 96.8750 (96.4660)  time: 0.1682  data: 0.0002  max mem: 24440
[16:16:52.817621] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3400 (0.7522)  acc1: 87.5000 (81.1480)  acc5: 100.0000 (95.8611)  time: 0.1682  data: 0.0002  max mem: 24440
[16:17:03.140074] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3226 (0.7555)  acc1: 90.6250 (81.0660)  acc5: 100.0000 (95.8560)  time: 0.1636  data: 0.0001  max mem: 24440
[16:17:03.243988] Test: Total time: 0:04:24 (0.1693 s / it)
[16:17:03.593635] * Acc@1 81.067 Acc@5 95.862 loss 0.755
[16:17:03.593806] Accuracy of the network on the 50000 test images: 81.1%
[16:17:03.593831] Max accuracy: 81.07%
[16:17:03.841292] log_dir: ./output_dir_qkformer
[16:17:06.640148] Epoch: [170]  [   0/5004]  eta: 3:53:11  lr: 0.000070  loss: 2.7882 (2.7882)  time: 2.7960  data: 2.3322  max mem: 24440
[16:32:27.527381] Epoch: [170]  [2000/5004]  eta: 0:23:06  lr: 0.000068  loss: 2.0545 (2.2271)  time: 0.4609  data: 0.0002  max mem: 24440
[16:47:47.003451] Epoch: [170]  [4000/5004]  eta: 0:07:42  lr: 0.000066  loss: 2.3040 (2.2339)  time: 0.4577  data: 0.0002  max mem: 24440
[16:55:28.142923] Epoch: [170]  [5003/5004]  eta: 0:00:00  lr: 0.000065  loss: 2.2115 (2.2309)  time: 0.4544  data: 0.0010  max mem: 24440
[16:55:28.583521] Epoch: [170] Total time: 0:38:24 (0.4606 s / it)
[16:55:28.616801] Averaged stats: lr: 0.000065  loss: 2.2115 (2.2287)
[16:55:30.338339] Test:  [   0/1563]  eta: 0:44:40  loss: 0.2220 (0.2220)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7152  data: 1.5146  max mem: 24440
[16:56:54.388103] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5093 (0.5665)  acc1: 87.5000 (85.2171)  acc5: 96.8750 (97.7607)  time: 0.1678  data: 0.0002  max mem: 24440
[16:58:18.398868] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7783 (0.6773)  acc1: 81.2500 (82.7048)  acc5: 96.8750 (96.5628)  time: 0.1679  data: 0.0002  max mem: 24440
[16:59:42.416613] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3202 (0.7355)  acc1: 87.5000 (81.3708)  acc5: 100.0000 (95.9818)  time: 0.1680  data: 0.0002  max mem: 24440
[16:59:52.752524] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2939 (0.7383)  acc1: 93.7500 (81.2760)  acc5: 100.0000 (95.9840)  time: 0.1636  data: 0.0001  max mem: 24440
[16:59:52.876050] Test: Total time: 0:04:24 (0.1691 s / it)
[16:59:53.353701] * Acc@1 81.272 Acc@5 95.990 loss 0.738
[16:59:53.353846] Accuracy of the network on the 50000 test images: 81.3%
[16:59:53.353867] Max accuracy: 81.27%
[16:59:53.451389] log_dir: ./output_dir_qkformer
[16:59:56.285365] Epoch: [171]  [   0/5004]  eta: 3:56:07  lr: 0.000065  loss: 2.7665 (2.7665)  time: 2.8312  data: 2.0372  max mem: 24440
[17:15:16.608198] Epoch: [171]  [2000/5004]  eta: 0:23:05  lr: 0.000064  loss: 2.1731 (2.2130)  time: 0.4599  data: 0.0003  max mem: 24440
[17:30:38.117887] Epoch: [171]  [4000/5004]  eta: 0:07:42  lr: 0.000062  loss: 2.3126 (2.2224)  time: 0.4587  data: 0.0002  max mem: 24440
[17:38:19.322807] Epoch: [171]  [5003/5004]  eta: 0:00:00  lr: 0.000061  loss: 2.0813 (2.2229)  time: 0.4533  data: 0.0006  max mem: 24440
[17:38:19.762011] Epoch: [171] Total time: 0:38:26 (0.4609 s / it)
[17:38:19.777444] Averaged stats: lr: 0.000061  loss: 2.0813 (2.2236)
[17:38:21.792447] Test:  [   0/1563]  eta: 0:52:19  loss: 0.2317 (0.2317)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0088  data: 1.8107  max mem: 24440
[17:39:45.837737] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4339 (0.5744)  acc1: 87.5000 (85.1485)  acc5: 96.8750 (97.6734)  time: 0.1679  data: 0.0002  max mem: 24440
[17:41:09.860681] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7901 (0.6799)  acc1: 78.1250 (82.5362)  acc5: 96.8750 (96.5160)  time: 0.1680  data: 0.0002  max mem: 24440
[17:42:33.845513] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3257 (0.7382)  acc1: 90.6250 (81.2063)  acc5: 100.0000 (95.9652)  time: 0.1682  data: 0.0002  max mem: 24440
[17:42:44.164435] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3129 (0.7411)  acc1: 90.6250 (81.1060)  acc5: 100.0000 (95.9780)  time: 0.1634  data: 0.0001  max mem: 24440
[17:42:44.281521] Test: Total time: 0:04:24 (0.1692 s / it)
[17:42:44.700601] * Acc@1 81.104 Acc@5 95.974 loss 0.741
[17:42:44.700757] Accuracy of the network on the 50000 test images: 81.1%
[17:42:44.700777] Max accuracy: 81.27%
[17:42:44.768653] log_dir: ./output_dir_qkformer
[17:42:47.383171] Epoch: [172]  [   0/5004]  eta: 3:37:53  lr: 0.000061  loss: 1.9218 (1.9218)  time: 2.6126  data: 2.1544  max mem: 24440
[17:58:09.827608] Epoch: [172]  [2000/5004]  eta: 0:23:08  lr: 0.000059  loss: 2.2663 (2.2091)  time: 0.4637  data: 0.0002  max mem: 24440
[18:13:30.613734] Epoch: [172]  [4000/5004]  eta: 0:07:43  lr: 0.000058  loss: 2.0433 (2.2115)  time: 0.4613  data: 0.0002  max mem: 24440
[18:21:11.759721] Epoch: [172]  [5003/5004]  eta: 0:00:00  lr: 0.000057  loss: 2.1823 (2.2141)  time: 0.4541  data: 0.0010  max mem: 24440
[18:21:12.342266] Epoch: [172] Total time: 0:38:27 (0.4611 s / it)
[18:21:12.398685] Averaged stats: lr: 0.000057  loss: 2.1823 (2.2180)
[18:21:14.676992] Test:  [   0/1563]  eta: 0:59:11  loss: 0.2473 (0.2473)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.2722  data: 2.0736  max mem: 24440
[18:22:38.651407] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4952 (0.5850)  acc1: 87.5000 (85.0549)  acc5: 96.8750 (97.5861)  time: 0.1681  data: 0.0002  max mem: 24440
[18:24:02.877973] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8567 (0.6859)  acc1: 71.8750 (82.6611)  acc5: 96.8750 (96.5316)  time: 0.1679  data: 0.0002  max mem: 24440
[18:25:26.868172] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3798 (0.7422)  acc1: 90.6250 (81.2771)  acc5: 100.0000 (95.9506)  time: 0.1680  data: 0.0002  max mem: 24440
[18:25:37.197389] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3218 (0.7444)  acc1: 93.7500 (81.2060)  acc5: 100.0000 (95.9420)  time: 0.1635  data: 0.0001  max mem: 24440
[18:25:37.353678] Test: Total time: 0:04:24 (0.1695 s / it)
[18:25:37.740351] * Acc@1 81.206 Acc@5 95.944 loss 0.744
[18:25:37.740512] Accuracy of the network on the 50000 test images: 81.2%
[18:25:37.740535] Max accuracy: 81.27%
[18:25:37.793652] log_dir: ./output_dir_qkformer
[18:25:41.039270] Epoch: [173]  [   0/5004]  eta: 4:30:37  lr: 0.000057  loss: 2.3334 (2.3334)  time: 3.2449  data: 2.2047  max mem: 24440
[18:41:01.153521] Epoch: [173]  [2000/5004]  eta: 0:23:06  lr: 0.000055  loss: 2.2112 (2.2030)  time: 0.4627  data: 0.0002  max mem: 24440
[18:56:21.395517] Epoch: [173]  [4000/5004]  eta: 0:07:42  lr: 0.000054  loss: 2.1422 (2.2022)  time: 0.4571  data: 0.0002  max mem: 24440
[19:04:02.992634] Epoch: [173]  [5003/5004]  eta: 0:00:00  lr: 0.000053  loss: 2.1484 (2.2074)  time: 0.4527  data: 0.0009  max mem: 24440
[19:04:03.393687] Epoch: [173] Total time: 0:38:25 (0.4608 s / it)
[19:04:03.399540] Averaged stats: lr: 0.000053  loss: 2.1484 (2.2099)
[19:04:05.054586] Test:  [   0/1563]  eta: 0:37:25  loss: 0.2856 (0.2856)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4365  data: 1.2537  max mem: 24440
[19:05:29.136184] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5515 (0.5780)  acc1: 84.3750 (85.1485)  acc5: 100.0000 (97.7483)  time: 0.1680  data: 0.0002  max mem: 24440
[19:06:53.210223] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8479 (0.6785)  acc1: 78.1250 (82.7173)  acc5: 96.8750 (96.4785)  time: 0.1682  data: 0.0003  max mem: 24440
[19:08:17.245433] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3325 (0.7334)  acc1: 90.6250 (81.3062)  acc5: 100.0000 (95.9735)  time: 0.1682  data: 0.0002  max mem: 24440
[19:08:27.579621] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3049 (0.7353)  acc1: 93.7500 (81.2240)  acc5: 100.0000 (95.9820)  time: 0.1636  data: 0.0001  max mem: 24440
[19:08:27.702480] Test: Total time: 0:04:24 (0.1691 s / it)
[19:08:27.897495] * Acc@1 81.223 Acc@5 95.981 loss 0.735
[19:08:27.897652] Accuracy of the network on the 50000 test images: 81.2%
[19:08:27.897672] Max accuracy: 81.27%
[19:08:27.933232] log_dir: ./output_dir_qkformer
[19:08:30.580324] Epoch: [174]  [   0/5004]  eta: 3:40:37  lr: 0.000053  loss: 2.9237 (2.9237)  time: 2.6454  data: 2.0326  max mem: 24440
[19:23:52.806385] Epoch: [174]  [2000/5004]  eta: 0:23:08  lr: 0.000051  loss: 2.2298 (2.1919)  time: 0.4612  data: 0.0002  max mem: 24440
[19:39:17.072959] Epoch: [174]  [4000/5004]  eta: 0:07:43  lr: 0.000050  loss: 2.1539 (2.2054)  time: 0.4582  data: 0.0003  max mem: 24440
[19:46:59.412422] Epoch: [174]  [5003/5004]  eta: 0:00:00  lr: 0.000049  loss: 2.0603 (2.2069)  time: 0.4527  data: 0.0008  max mem: 24440
[19:47:00.030150] Epoch: [174] Total time: 0:38:32 (0.4620 s / it)
[19:47:00.056687] Averaged stats: lr: 0.000049  loss: 2.0603 (2.2027)
[19:47:04.299604] Test:  [   0/1563]  eta: 1:50:24  loss: 0.2628 (0.2628)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 4.2383  data: 3.6280  max mem: 24440
[19:48:28.286770] Test:  [ 500/1563]  eta: 0:03:07  loss: 0.5783 (0.5653)  acc1: 87.5000 (85.2233)  acc5: 96.8750 (97.7607)  time: 0.1678  data: 0.0002  max mem: 24440
[19:49:52.298094] Test:  [1000/1563]  eta: 0:01:36  loss: 0.8376 (0.6779)  acc1: 75.0000 (82.6892)  acc5: 96.8750 (96.5378)  time: 0.1678  data: 0.0002  max mem: 24440
[19:51:16.309590] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3910 (0.7355)  acc1: 90.6250 (81.3937)  acc5: 100.0000 (95.9402)  time: 0.1679  data: 0.0002  max mem: 24440
[19:51:26.645630] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3243 (0.7391)  acc1: 90.6250 (81.2960)  acc5: 100.0000 (95.9440)  time: 0.1636  data: 0.0001  max mem: 24440
[19:51:26.743191] Test: Total time: 0:04:26 (0.1706 s / it)
[19:51:27.256300] * Acc@1 81.299 Acc@5 95.942 loss 0.739
[19:51:27.256563] Accuracy of the network on the 50000 test images: 81.3%
[19:51:27.256617] Max accuracy: 81.30%
[19:51:27.437963] log_dir: ./output_dir_qkformer
[19:51:31.643876] Epoch: [175]  [   0/5004]  eta: 5:50:34  lr: 0.000049  loss: 1.9850 (1.9850)  time: 4.2036  data: 3.3172  max mem: 24440
[20:06:53.813433] Epoch: [175]  [2000/5004]  eta: 0:23:10  lr: 0.000047  loss: 2.1847 (2.1976)  time: 0.4588  data: 0.0002  max mem: 24440
[20:22:15.147737] Epoch: [175]  [4000/5004]  eta: 0:07:43  lr: 0.000046  loss: 2.1096 (2.2010)  time: 0.4560  data: 0.0003  max mem: 24440
[20:29:56.918515] Epoch: [175]  [5003/5004]  eta: 0:00:00  lr: 0.000045  loss: 2.1211 (2.1993)  time: 0.4532  data: 0.0008  max mem: 24440
[20:29:57.359530] Epoch: [175] Total time: 0:38:29 (0.4616 s / it)
[20:29:57.364799] Averaged stats: lr: 0.000045  loss: 2.1211 (2.1981)
[20:29:59.452350] Test:  [   0/1563]  eta: 0:54:16  loss: 0.3115 (0.3115)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0834  data: 1.7888  max mem: 24440
[20:31:23.476657] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5113 (0.5669)  acc1: 87.5000 (85.0050)  acc5: 96.8750 (97.6921)  time: 0.1678  data: 0.0002  max mem: 24440
[20:32:47.492909] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8039 (0.6760)  acc1: 75.0000 (82.5643)  acc5: 96.8750 (96.5628)  time: 0.1682  data: 0.0002  max mem: 24440
[20:34:11.629225] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4085 (0.7323)  acc1: 87.5000 (81.2979)  acc5: 100.0000 (95.9798)  time: 0.1680  data: 0.0002  max mem: 24440
[20:34:21.959922] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3139 (0.7346)  acc1: 90.6250 (81.2440)  acc5: 100.0000 (95.9860)  time: 0.1635  data: 0.0001  max mem: 24440
[20:34:22.075238] Test: Total time: 0:04:24 (0.1694 s / it)
[20:34:22.526814] * Acc@1 81.246 Acc@5 95.987 loss 0.735
[20:34:22.526971] Accuracy of the network on the 50000 test images: 81.2%
[20:34:22.526991] Max accuracy: 81.30%
[20:34:22.666272] log_dir: ./output_dir_qkformer
[20:34:25.259491] Epoch: [176]  [   0/5004]  eta: 3:36:09  lr: 0.000045  loss: 2.4587 (2.4587)  time: 2.5918  data: 2.0377  max mem: 24440
[20:49:47.366617] Epoch: [176]  [2000/5004]  eta: 0:23:07  lr: 0.000044  loss: 2.2033 (2.1841)  time: 0.4571  data: 0.0002  max mem: 24440
[21:05:10.755508] Epoch: [176]  [4000/5004]  eta: 0:07:43  lr: 0.000042  loss: 2.0929 (2.1881)  time: 0.4613  data: 0.0002  max mem: 24440
[21:12:52.978083] Epoch: [176]  [5003/5004]  eta: 0:00:00  lr: 0.000042  loss: 2.0607 (2.1875)  time: 0.4542  data: 0.0005  max mem: 24440
[21:12:53.507667] Epoch: [176] Total time: 0:38:30 (0.4618 s / it)
[21:12:53.570211] Averaged stats: lr: 0.000042  loss: 2.0607 (2.1891)
[21:12:57.197217] Test:  [   0/1563]  eta: 1:34:19  loss: 0.2111 (0.2111)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 3.6210  data: 3.1353  max mem: 24440
[21:14:21.170277] Test:  [ 500/1563]  eta: 0:03:05  loss: 0.4271 (0.5694)  acc1: 84.3750 (85.3293)  acc5: 96.8750 (97.7233)  time: 0.1681  data: 0.0002  max mem: 24440
[21:15:45.123223] Test:  [1000/1563]  eta: 0:01:36  loss: 0.8533 (0.6797)  acc1: 75.0000 (82.8109)  acc5: 96.8750 (96.4848)  time: 0.1678  data: 0.0002  max mem: 24440
[21:17:09.108604] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3586 (0.7354)  acc1: 90.6250 (81.4374)  acc5: 100.0000 (95.9902)  time: 0.1680  data: 0.0002  max mem: 24440
[21:17:19.437064] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2441 (0.7387)  acc1: 90.6250 (81.3260)  acc5: 100.0000 (95.9960)  time: 0.1635  data: 0.0001  max mem: 24440
[21:17:19.556020] Test: Total time: 0:04:25 (0.1702 s / it)
[21:17:20.116876] * Acc@1 81.328 Acc@5 95.995 loss 0.739
[21:17:20.117071] Accuracy of the network on the 50000 test images: 81.3%
[21:17:20.117096] Max accuracy: 81.33%
[21:17:20.186483] log_dir: ./output_dir_qkformer
[21:17:22.832747] Epoch: [177]  [   0/5004]  eta: 3:40:22  lr: 0.000042  loss: 2.5073 (2.5073)  time: 2.6423  data: 2.0341  max mem: 24440
[21:32:43.561215] Epoch: [177]  [2000/5004]  eta: 0:23:06  lr: 0.000040  loss: 2.1850 (2.1744)  time: 0.4587  data: 0.0003  max mem: 24440
[21:48:04.647826] Epoch: [177]  [4000/5004]  eta: 0:07:42  lr: 0.000039  loss: 2.2756 (2.1794)  time: 0.4576  data: 0.0002  max mem: 24440
[21:55:46.010959] Epoch: [177]  [5003/5004]  eta: 0:00:00  lr: 0.000038  loss: 2.1125 (2.1792)  time: 0.4534  data: 0.0011  max mem: 24440
[21:55:46.693819] Epoch: [177] Total time: 0:38:26 (0.4609 s / it)
[21:55:46.706816] Averaged stats: lr: 0.000038  loss: 2.1125 (2.1813)
[21:55:50.787205] Test:  [   0/1563]  eta: 1:46:08  loss: 0.2354 (0.2354)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 4.0743  data: 3.4158  max mem: 24440
[21:57:14.774759] Test:  [ 500/1563]  eta: 0:03:06  loss: 0.5691 (0.5698)  acc1: 84.3750 (85.2732)  acc5: 96.8750 (97.8293)  time: 0.1679  data: 0.0002  max mem: 24440
[21:58:38.777491] Test:  [1000/1563]  eta: 0:01:36  loss: 0.8569 (0.6784)  acc1: 78.1250 (82.8203)  acc5: 93.7500 (96.5722)  time: 0.1678  data: 0.0002  max mem: 24440
[22:00:02.785256] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3977 (0.7386)  acc1: 90.6250 (81.4540)  acc5: 100.0000 (95.9527)  time: 0.1680  data: 0.0002  max mem: 24440
[22:00:13.107432] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3416 (0.7410)  acc1: 90.6250 (81.3420)  acc5: 100.0000 (95.9560)  time: 0.1635  data: 0.0001  max mem: 24440
[22:00:13.239416] Test: Total time: 0:04:26 (0.1705 s / it)
[22:00:13.632577] * Acc@1 81.335 Acc@5 95.959 loss 0.741
[22:00:13.632726] Accuracy of the network on the 50000 test images: 81.3%
[22:00:13.632746] Max accuracy: 81.33%
[22:00:13.742778] log_dir: ./output_dir_qkformer
[22:00:16.622363] Epoch: [178]  [   0/5004]  eta: 4:00:02  lr: 0.000038  loss: 1.6055 (1.6055)  time: 2.8782  data: 2.2901  max mem: 24440
[22:15:39.025731] Epoch: [178]  [2000/5004]  eta: 0:23:09  lr: 0.000037  loss: 2.3663 (2.1772)  time: 0.4643  data: 0.0003  max mem: 24440
[22:30:59.782743] Epoch: [178]  [4000/5004]  eta: 0:07:43  lr: 0.000036  loss: 2.2316 (2.1803)  time: 0.4580  data: 0.0002  max mem: 24440
[22:38:41.992161] Epoch: [178]  [5003/5004]  eta: 0:00:00  lr: 0.000035  loss: 1.9738 (2.1790)  time: 0.4531  data: 0.0009  max mem: 24440
[22:38:42.394012] Epoch: [178] Total time: 0:38:28 (0.4614 s / it)
[22:38:42.396443] Averaged stats: lr: 0.000035  loss: 1.9738 (2.1751)
[22:38:44.473579] Test:  [   0/1563]  eta: 0:53:56  loss: 0.2586 (0.2586)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0710  data: 1.8985  max mem: 24440
[22:40:08.466930] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5455 (0.5653)  acc1: 87.5000 (85.5352)  acc5: 96.8750 (97.6859)  time: 0.1680  data: 0.0002  max mem: 24440
[22:41:32.432518] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8193 (0.6718)  acc1: 78.1250 (82.8796)  acc5: 96.8750 (96.5472)  time: 0.1681  data: 0.0002  max mem: 24440
[22:42:56.427967] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3700 (0.7287)  acc1: 90.6250 (81.5769)  acc5: 100.0000 (96.0464)  time: 0.1681  data: 0.0002  max mem: 24440
[22:43:06.752446] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3354 (0.7319)  acc1: 93.7500 (81.4620)  acc5: 100.0000 (96.0380)  time: 0.1635  data: 0.0001  max mem: 24440
[22:43:06.885250] Test: Total time: 0:04:24 (0.1692 s / it)
[22:43:07.329757] * Acc@1 81.457 Acc@5 96.036 loss 0.732
[22:43:07.329921] Accuracy of the network on the 50000 test images: 81.5%
[22:43:07.329945] Max accuracy: 81.46%
[22:43:07.451480] log_dir: ./output_dir_qkformer
[22:43:10.141548] Epoch: [179]  [   0/5004]  eta: 3:44:10  lr: 0.000035  loss: 1.9346 (1.9346)  time: 2.6880  data: 2.0756  max mem: 24440
[22:58:32.080877] Epoch: [179]  [2000/5004]  eta: 0:23:07  lr: 0.000034  loss: 2.1988 (2.1685)  time: 0.4613  data: 0.0002  max mem: 24440
[23:13:53.052774] Epoch: [179]  [4000/5004]  eta: 0:07:43  lr: 0.000032  loss: 2.0195 (2.1775)  time: 0.4591  data: 0.0003  max mem: 24440
[23:21:34.514954] Epoch: [179]  [5003/5004]  eta: 0:00:00  lr: 0.000032  loss: 2.1247 (2.1754)  time: 0.4577  data: 0.0008  max mem: 24440
[23:21:35.280604] Epoch: [179] Total time: 0:38:27 (0.4612 s / it)
[23:21:35.315514] Averaged stats: lr: 0.000032  loss: 2.1247 (2.1715)
[23:21:37.821839] Test:  [   0/1563]  eta: 1:05:07  loss: 0.3018 (0.3018)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.4997  data: 2.0488  max mem: 24440
[23:23:01.786460] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.4298 (0.5572)  acc1: 87.5000 (85.3231)  acc5: 100.0000 (97.7670)  time: 0.1678  data: 0.0002  max mem: 24440
[23:24:25.815338] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9227 (0.6668)  acc1: 78.1250 (82.7423)  acc5: 96.8750 (96.6034)  time: 0.1680  data: 0.0002  max mem: 24440
[23:25:49.831142] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3543 (0.7180)  acc1: 90.6250 (81.5311)  acc5: 100.0000 (96.1463)  time: 0.1682  data: 0.0002  max mem: 24440
[23:26:00.165732] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3231 (0.7207)  acc1: 93.7500 (81.4400)  acc5: 100.0000 (96.1580)  time: 0.1636  data: 0.0001  max mem: 24440
[23:26:00.286111] Test: Total time: 0:04:24 (0.1695 s / it)
[23:26:00.810009] * Acc@1 81.442 Acc@5 96.160 loss 0.721
[23:26:00.810187] Accuracy of the network on the 50000 test images: 81.4%
[23:26:00.810213] Max accuracy: 81.46%
[23:26:00.883575] log_dir: ./output_dir_qkformer
[23:26:03.683192] Epoch: [180]  [   0/5004]  eta: 3:53:21  lr: 0.000032  loss: 2.0309 (2.0309)  time: 2.7981  data: 2.3192  max mem: 24440
[23:41:24.147625] Epoch: [180]  [2000/5004]  eta: 0:23:05  lr: 0.000031  loss: 2.0296 (2.1575)  time: 0.4593  data: 0.0002  max mem: 24440
[23:56:44.821234] Epoch: [180]  [4000/5004]  eta: 0:07:42  lr: 0.000029  loss: 2.1210 (2.1660)  time: 0.4608  data: 0.0003  max mem: 24440
[00:04:26.269476] Epoch: [180]  [5003/5004]  eta: 0:00:00  lr: 0.000029  loss: 2.1359 (2.1674)  time: 0.4541  data: 0.0005  max mem: 24440
[00:04:26.687231] Epoch: [180] Total time: 0:38:25 (0.4608 s / it)
[00:04:26.691455] Averaged stats: lr: 0.000029  loss: 2.1359 (2.1626)
[00:04:28.165638] Test:  [   0/1563]  eta: 0:38:16  loss: 0.2903 (0.2903)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.4693  data: 1.2889  max mem: 24440
[00:05:52.227770] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5119 (0.5663)  acc1: 87.5000 (85.4541)  acc5: 96.8750 (97.7857)  time: 0.1681  data: 0.0002  max mem: 24440
[00:07:16.271112] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7991 (0.6693)  acc1: 71.8750 (82.9702)  acc5: 96.8750 (96.7439)  time: 0.1682  data: 0.0002  max mem: 24440
[00:08:40.315801] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3909 (0.7262)  acc1: 90.6250 (81.6789)  acc5: 100.0000 (96.1005)  time: 0.1683  data: 0.0002  max mem: 24440
[00:08:50.656412] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2914 (0.7282)  acc1: 93.7500 (81.5920)  acc5: 100.0000 (96.1240)  time: 0.1637  data: 0.0001  max mem: 24440
[00:08:50.784907] Test: Total time: 0:04:24 (0.1690 s / it)
[00:08:51.356611] * Acc@1 81.594 Acc@5 96.125 loss 0.728
[00:08:51.356771] Accuracy of the network on the 50000 test images: 81.6%
[00:08:51.356793] Max accuracy: 81.59%
[00:08:51.456526] log_dir: ./output_dir_qkformer
[00:08:54.208840] Epoch: [181]  [   0/5004]  eta: 3:49:23  lr: 0.000029  loss: 1.9470 (1.9470)  time: 2.7505  data: 2.2952  max mem: 24440
[00:24:16.797095] Epoch: [181]  [2000/5004]  eta: 0:23:08  lr: 0.000028  loss: 2.0834 (2.1575)  time: 0.4613  data: 0.0002  max mem: 24440
[00:39:36.948392] Epoch: [181]  [4000/5004]  eta: 0:07:43  lr: 0.000027  loss: 2.0404 (2.1612)  time: 0.4570  data: 0.0003  max mem: 24440
[00:47:18.110840] Epoch: [181]  [5003/5004]  eta: 0:00:00  lr: 0.000026  loss: 2.0547 (2.1595)  time: 0.4536  data: 0.0009  max mem: 24440
[00:47:18.531726] Epoch: [181] Total time: 0:38:27 (0.4610 s / it)
[00:47:18.564425] Averaged stats: lr: 0.000026  loss: 2.0547 (2.1596)
[00:47:20.864597] Test:  [   0/1563]  eta: 0:59:44  loss: 0.3415 (0.3415)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.2933  data: 1.9362  max mem: 24440
[00:48:44.940883] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.5126 (0.5549)  acc1: 87.5000 (85.7535)  acc5: 96.8750 (97.7670)  time: 0.1679  data: 0.0002  max mem: 24440
[00:50:08.977952] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7718 (0.6608)  acc1: 78.1250 (83.0857)  acc5: 96.8750 (96.6658)  time: 0.1680  data: 0.0002  max mem: 24440
[00:51:32.987456] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3769 (0.7210)  acc1: 90.6250 (81.6830)  acc5: 100.0000 (96.0651)  time: 0.1679  data: 0.0002  max mem: 24440
[00:51:43.312922] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2974 (0.7239)  acc1: 93.7500 (81.5840)  acc5: 100.0000 (96.0560)  time: 0.1635  data: 0.0001  max mem: 24440
[00:51:43.456131] Test: Total time: 0:04:24 (0.1695 s / it)
[00:51:43.868462] * Acc@1 81.577 Acc@5 96.058 loss 0.724
[00:51:43.868706] Accuracy of the network on the 50000 test images: 81.6%
[00:51:43.868750] Max accuracy: 81.59%
[00:51:43.927023] log_dir: ./output_dir_qkformer
[00:51:46.626238] Epoch: [182]  [   0/5004]  eta: 3:44:55  lr: 0.000026  loss: 2.2960 (2.2960)  time: 2.6969  data: 2.1636  max mem: 24440
[01:07:09.683288] Epoch: [182]  [2000/5004]  eta: 0:23:09  lr: 0.000025  loss: 2.0817 (2.1580)  time: 0.4597  data: 0.0002  max mem: 24440
[01:22:31.723228] Epoch: [182]  [4000/5004]  eta: 0:07:43  lr: 0.000024  loss: 2.0711 (2.1588)  time: 0.4650  data: 0.0003  max mem: 24440
[01:30:14.069255] Epoch: [182]  [5003/5004]  eta: 0:00:00  lr: 0.000023  loss: 1.9899 (2.1578)  time: 0.4531  data: 0.0006  max mem: 24440
[01:30:14.527031] Epoch: [182] Total time: 0:38:30 (0.4618 s / it)
[01:30:14.529080] Averaged stats: lr: 0.000023  loss: 1.9899 (2.1539)
[01:30:16.609128] Test:  [   0/1563]  eta: 0:54:03  loss: 0.3670 (0.3670)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0753  data: 1.7925  max mem: 24440
[01:31:40.698001] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4162 (0.5629)  acc1: 87.5000 (85.7784)  acc5: 96.8750 (97.8106)  time: 0.1679  data: 0.0002  max mem: 24440
[01:33:04.748724] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8338 (0.6643)  acc1: 75.0000 (83.2261)  acc5: 96.8750 (96.6814)  time: 0.1681  data: 0.0002  max mem: 24440
[01:34:28.787661] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3689 (0.7176)  acc1: 90.6250 (81.9308)  acc5: 100.0000 (96.1422)  time: 0.1682  data: 0.0002  max mem: 24440
[01:34:39.121903] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3547 (0.7204)  acc1: 90.6250 (81.8300)  acc5: 100.0000 (96.1620)  time: 0.1636  data: 0.0001  max mem: 24440
[01:34:39.248279] Test: Total time: 0:04:24 (0.1694 s / it)
[01:34:39.677804] * Acc@1 81.837 Acc@5 96.163 loss 0.720
[01:34:39.677956] Accuracy of the network on the 50000 test images: 81.8%
[01:34:42.371316] Max accuracy: 81.84%
[01:34:42.398672] log_dir: ./output_dir_qkformer
[01:34:45.412477] Epoch: [183]  [   0/5004]  eta: 4:11:15  lr: 0.000023  loss: 1.7322 (1.7322)  time: 3.0128  data: 2.5454  max mem: 24440
[01:50:06.041236] Epoch: [183]  [2000/5004]  eta: 0:23:06  lr: 0.000022  loss: 2.0654 (2.1434)  time: 0.4633  data: 0.0002  max mem: 24440
[02:05:26.211539] Epoch: [183]  [4000/5004]  eta: 0:07:42  lr: 0.000021  loss: 2.1543 (2.1460)  time: 0.4583  data: 0.0003  max mem: 24440
[02:13:07.548078] Epoch: [183]  [5003/5004]  eta: 0:00:00  lr: 0.000021  loss: 2.1487 (2.1447)  time: 0.4574  data: 0.0009  max mem: 24440
[02:13:08.070630] Epoch: [183] Total time: 0:38:25 (0.4608 s / it)
[02:13:08.096133] Averaged stats: lr: 0.000021  loss: 2.1487 (2.1482)
[02:13:09.958439] Test:  [   0/1563]  eta: 0:48:19  loss: 0.2929 (0.2929)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8553  data: 1.6303  max mem: 24440
[02:14:34.004573] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5582 (0.5592)  acc1: 87.5000 (85.6911)  acc5: 96.8750 (97.7607)  time: 0.1679  data: 0.0002  max mem: 24440
[02:15:57.961249] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7922 (0.6642)  acc1: 78.1250 (83.2012)  acc5: 96.8750 (96.6159)  time: 0.1678  data: 0.0002  max mem: 24440
[02:17:21.968490] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3627 (0.7210)  acc1: 90.6250 (81.8100)  acc5: 100.0000 (96.0776)  time: 0.1680  data: 0.0002  max mem: 24440
[02:17:32.298079] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3187 (0.7238)  acc1: 93.7500 (81.7040)  acc5: 100.0000 (96.0800)  time: 0.1635  data: 0.0001  max mem: 24440
[02:17:32.432137] Test: Total time: 0:04:24 (0.1691 s / it)
[02:17:32.874320] * Acc@1 81.714 Acc@5 96.082 loss 0.724
[02:17:32.874476] Accuracy of the network on the 50000 test images: 81.7%
[02:17:32.874498] Max accuracy: 81.84%
[02:17:32.960263] log_dir: ./output_dir_qkformer
[02:17:35.552263] Epoch: [184]  [   0/5004]  eta: 3:36:05  lr: 0.000021  loss: 2.3363 (2.3363)  time: 2.5911  data: 2.1149  max mem: 24440
[02:32:56.010204] Epoch: [184]  [2000/5004]  eta: 0:23:05  lr: 0.000020  loss: 2.1279 (2.1391)  time: 0.4571  data: 0.0003  max mem: 24440
[02:48:16.958013] Epoch: [184]  [4000/5004]  eta: 0:07:42  lr: 0.000019  loss: 2.1246 (2.1448)  time: 0.4589  data: 0.0002  max mem: 24440
[02:55:58.330677] Epoch: [184]  [5003/5004]  eta: 0:00:00  lr: 0.000018  loss: 2.1393 (2.1474)  time: 0.4540  data: 0.0005  max mem: 24440
[02:55:58.785370] Epoch: [184] Total time: 0:38:25 (0.4608 s / it)
[02:55:58.790783] Averaged stats: lr: 0.000018  loss: 2.1393 (2.1458)
[02:56:00.681204] Test:  [   0/1563]  eta: 0:49:06  loss: 0.2878 (0.2878)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8852  data: 1.7077  max mem: 24440
[02:57:24.737296] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4262 (0.5536)  acc1: 87.5000 (85.7535)  acc5: 96.8750 (97.8668)  time: 0.1679  data: 0.0002  max mem: 24440
[02:58:48.766504] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8823 (0.6577)  acc1: 71.8750 (83.2511)  acc5: 96.8750 (96.7876)  time: 0.1685  data: 0.0002  max mem: 24440
[03:00:12.776802] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3615 (0.7126)  acc1: 87.5000 (81.9225)  acc5: 100.0000 (96.2504)  time: 0.1678  data: 0.0002  max mem: 24440
[03:00:23.103130] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3617 (0.7154)  acc1: 93.7500 (81.8360)  acc5: 100.0000 (96.2600)  time: 0.1635  data: 0.0001  max mem: 24440
[03:00:23.228252] Test: Total time: 0:04:24 (0.1692 s / it)
[03:00:23.695308] * Acc@1 81.839 Acc@5 96.261 loss 0.715
[03:00:23.695511] Accuracy of the network on the 50000 test images: 81.8%
[03:00:26.412777] Max accuracy: 81.84%
[03:00:26.459023] log_dir: ./output_dir_qkformer
[03:00:29.393114] Epoch: [185]  [   0/5004]  eta: 4:04:38  lr: 0.000018  loss: 2.1254 (2.1254)  time: 2.9333  data: 2.4656  max mem: 24440
[03:15:49.900394] Epoch: [185]  [2000/5004]  eta: 0:23:06  lr: 0.000018  loss: 2.0727 (2.1323)  time: 0.4592  data: 0.0003  max mem: 24440
[03:31:10.028553] Epoch: [185]  [4000/5004]  eta: 0:07:42  lr: 0.000017  loss: 2.2201 (2.1355)  time: 0.4575  data: 0.0002  max mem: 24440
[03:38:50.812719] Epoch: [185]  [5003/5004]  eta: 0:00:00  lr: 0.000016  loss: 1.9897 (2.1356)  time: 0.4529  data: 0.0005  max mem: 24440
[03:38:51.163158] Epoch: [185] Total time: 0:38:24 (0.4606 s / it)
[03:38:51.184601] Averaged stats: lr: 0.000016  loss: 1.9897 (2.1397)
[03:38:52.847730] Test:  [   0/1563]  eta: 0:43:11  loss: 0.2834 (0.2834)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6578  data: 1.4792  max mem: 24440
[03:40:16.876880] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4554 (0.5612)  acc1: 87.5000 (85.7036)  acc5: 96.8750 (97.7982)  time: 0.1679  data: 0.0002  max mem: 24440
[03:41:40.902617] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8220 (0.6691)  acc1: 78.1250 (83.1512)  acc5: 96.8750 (96.5722)  time: 0.1679  data: 0.0002  max mem: 24440
[03:43:04.945719] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4299 (0.7234)  acc1: 87.5000 (81.9308)  acc5: 100.0000 (96.0880)  time: 0.1679  data: 0.0002  max mem: 24440
[03:43:15.283645] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3039 (0.7264)  acc1: 93.7500 (81.8140)  acc5: 100.0000 (96.0960)  time: 0.1636  data: 0.0001  max mem: 24440
[03:43:15.402352] Test: Total time: 0:04:24 (0.1690 s / it)
[03:43:15.624449] * Acc@1 81.821 Acc@5 96.093 loss 0.726
[03:43:15.624601] Accuracy of the network on the 50000 test images: 81.8%
[03:43:15.624621] Max accuracy: 81.84%
[03:43:15.710944] log_dir: ./output_dir_qkformer
[03:43:18.394401] Epoch: [186]  [   0/5004]  eta: 3:43:33  lr: 0.000016  loss: 2.0296 (2.0296)  time: 2.6806  data: 2.2160  max mem: 24440
[03:58:39.532310] Epoch: [186]  [2000/5004]  eta: 0:23:06  lr: 0.000015  loss: 2.0494 (2.1414)  time: 0.4596  data: 0.0002  max mem: 24440
[04:13:59.194455] Epoch: [186]  [4000/5004]  eta: 0:07:42  lr: 0.000015  loss: 2.2815 (2.1405)  time: 0.4626  data: 0.0002  max mem: 24440
[04:21:40.725397] Epoch: [186]  [5003/5004]  eta: 0:00:00  lr: 0.000014  loss: 2.1089 (2.1399)  time: 0.4578  data: 0.0009  max mem: 24440
[04:21:41.192753] Epoch: [186] Total time: 0:38:25 (0.4607 s / it)
[04:21:41.194159] Averaged stats: lr: 0.000014  loss: 2.1089 (2.1366)
[04:21:43.535599] Test:  [   0/1563]  eta: 1:00:50  loss: 0.2332 (0.2332)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.3353  data: 2.0275  max mem: 24440
[04:23:07.574834] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.5254 (0.5645)  acc1: 84.3750 (85.5976)  acc5: 100.0000 (97.8792)  time: 0.1680  data: 0.0002  max mem: 24440
[04:24:31.624678] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8560 (0.6668)  acc1: 71.8750 (83.1262)  acc5: 96.8750 (96.7657)  time: 0.1681  data: 0.0002  max mem: 24440
[04:25:55.668627] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3748 (0.7229)  acc1: 87.5000 (81.7622)  acc5: 100.0000 (96.1900)  time: 0.1681  data: 0.0002  max mem: 24440
[04:26:05.998734] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3330 (0.7252)  acc1: 93.7500 (81.6700)  acc5: 100.0000 (96.1860)  time: 0.1636  data: 0.0001  max mem: 24440
[04:26:06.116871] Test: Total time: 0:04:24 (0.1695 s / it)
[04:26:06.508032] * Acc@1 81.661 Acc@5 96.183 loss 0.725
[04:26:06.508182] Accuracy of the network on the 50000 test images: 81.7%
[04:26:06.508206] Max accuracy: 81.84%
[04:26:06.587045] log_dir: ./output_dir_qkformer
[04:26:09.450892] Epoch: [187]  [   0/5004]  eta: 3:58:45  lr: 0.000014  loss: 1.8488 (1.8488)  time: 2.8627  data: 2.0873  max mem: 24440
[04:41:30.919806] Epoch: [187]  [2000/5004]  eta: 0:23:07  lr: 0.000013  loss: 2.1396 (2.1409)  time: 0.4635  data: 0.0002  max mem: 24440
[04:56:52.052746] Epoch: [187]  [4000/5004]  eta: 0:07:43  lr: 0.000013  loss: 2.0158 (2.1363)  time: 0.4600  data: 0.0003  max mem: 24440
[05:04:33.358112] Epoch: [187]  [5003/5004]  eta: 0:00:00  lr: 0.000012  loss: 1.9089 (2.1363)  time: 0.4535  data: 0.0013  max mem: 24440
[05:04:33.829115] Epoch: [187] Total time: 0:38:27 (0.4611 s / it)
[05:04:33.834430] Averaged stats: lr: 0.000012  loss: 1.9089 (2.1345)
[05:04:36.380028] Test:  [   0/1563]  eta: 1:06:09  loss: 0.3343 (0.3343)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.5396  data: 2.2376  max mem: 24440
[05:06:00.406987] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.4537 (0.5525)  acc1: 87.5000 (85.8782)  acc5: 96.8750 (97.6796)  time: 0.1680  data: 0.0002  max mem: 24440
[05:07:24.448305] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7916 (0.6545)  acc1: 75.0000 (83.5165)  acc5: 96.8750 (96.6159)  time: 0.1678  data: 0.0002  max mem: 24440
[05:08:48.519897] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3419 (0.7096)  acc1: 87.5000 (82.1078)  acc5: 100.0000 (96.1755)  time: 0.1681  data: 0.0002  max mem: 24440
[05:08:58.848121] Test:  [1562/1563]  eta: 0:00:00  loss: 0.2613 (0.7122)  acc1: 90.6250 (82.0180)  acc5: 100.0000 (96.1980)  time: 0.1635  data: 0.0001  max mem: 24440
[05:08:58.968411] Test: Total time: 0:04:25 (0.1696 s / it)
[05:08:59.394665] * Acc@1 82.019 Acc@5 96.200 loss 0.712
[05:08:59.394928] Accuracy of the network on the 50000 test images: 82.0%
[05:09:01.993274] Max accuracy: 82.02%
[05:09:02.026181] log_dir: ./output_dir_qkformer
[05:09:05.013661] Epoch: [188]  [   0/5004]  eta: 4:09:06  lr: 0.000012  loss: 2.4359 (2.4359)  time: 2.9869  data: 2.5196  max mem: 24440
[05:24:24.893990] Epoch: [188]  [2000/5004]  eta: 0:23:05  lr: 0.000011  loss: 2.1276 (2.1364)  time: 0.4575  data: 0.0002  max mem: 24440
[05:39:44.583718] Epoch: [188]  [4000/5004]  eta: 0:07:42  lr: 0.000011  loss: 2.1508 (2.1333)  time: 0.4586  data: 0.0002  max mem: 24440
[05:47:25.191906] Epoch: [188]  [5003/5004]  eta: 0:00:00  lr: 0.000010  loss: 2.0357 (2.1304)  time: 0.4543  data: 0.0009  max mem: 24440
[05:47:25.603214] Epoch: [188] Total time: 0:38:23 (0.4603 s / it)
[05:47:25.606513] Averaged stats: lr: 0.000010  loss: 2.0357 (2.1310)
[05:47:27.146175] Test:  [   0/1563]  eta: 0:39:59  loss: 0.2615 (0.2615)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5351  data: 1.3544  max mem: 24440
[05:48:51.201660] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4451 (0.5521)  acc1: 87.5000 (85.9593)  acc5: 100.0000 (97.8231)  time: 0.1680  data: 0.0002  max mem: 24440
[05:50:15.273373] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7696 (0.6570)  acc1: 78.1250 (83.2917)  acc5: 96.8750 (96.7564)  time: 0.1679  data: 0.0002  max mem: 24440
[05:51:39.325565] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4076 (0.7116)  acc1: 90.6250 (81.9933)  acc5: 100.0000 (96.2067)  time: 0.1680  data: 0.0002  max mem: 24440
[05:51:49.653157] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3373 (0.7137)  acc1: 93.7500 (81.9320)  acc5: 100.0000 (96.2140)  time: 0.1636  data: 0.0001  max mem: 24440
[05:51:49.776390] Test: Total time: 0:04:24 (0.1690 s / it)
[05:51:50.217121] * Acc@1 81.939 Acc@5 96.211 loss 0.714
[05:51:50.217294] Accuracy of the network on the 50000 test images: 81.9%
[05:51:50.217317] Max accuracy: 82.02%
[05:51:50.303770] log_dir: ./output_dir_qkformer
[05:51:53.160509] Epoch: [189]  [   0/5004]  eta: 3:58:05  lr: 0.000010  loss: 1.4270 (1.4270)  time: 2.8549  data: 2.3623  max mem: 24440
[06:07:14.565537] Epoch: [189]  [2000/5004]  eta: 0:23:07  lr: 0.000010  loss: 2.0096 (2.1294)  time: 0.4584  data: 0.0002  max mem: 24440
[06:22:35.763696] Epoch: [189]  [4000/5004]  eta: 0:07:43  lr: 0.000009  loss: 1.9709 (2.1227)  time: 0.4570  data: 0.0002  max mem: 24440
[06:30:17.609962] Epoch: [189]  [5003/5004]  eta: 0:00:00  lr: 0.000009  loss: 2.1624 (2.1251)  time: 0.4536  data: 0.0009  max mem: 24440
[06:30:18.126000] Epoch: [189] Total time: 0:38:27 (0.4612 s / it)
[06:30:18.131593] Averaged stats: lr: 0.000009  loss: 2.1624 (2.1286)
[06:30:20.324586] Test:  [   0/1563]  eta: 0:57:00  loss: 0.2672 (0.2672)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1884  data: 1.7774  max mem: 24440
[06:31:44.326660] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5448 (0.5551)  acc1: 84.3750 (85.8658)  acc5: 96.8750 (97.7982)  time: 0.1679  data: 0.0002  max mem: 24440
[06:33:08.324419] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7706 (0.6573)  acc1: 78.1250 (83.3292)  acc5: 96.8750 (96.7220)  time: 0.1679  data: 0.0002  max mem: 24440
[06:34:32.338740] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3778 (0.7111)  acc1: 87.5000 (82.0620)  acc5: 100.0000 (96.2025)  time: 0.1684  data: 0.0002  max mem: 24440
[06:34:42.672299] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3400 (0.7146)  acc1: 93.7500 (81.9620)  acc5: 100.0000 (96.2200)  time: 0.1635  data: 0.0001  max mem: 24440
[06:34:42.790203] Test: Total time: 0:04:24 (0.1693 s / it)
[06:34:43.466230] * Acc@1 81.964 Acc@5 96.213 loss 0.715
[06:34:43.466392] Accuracy of the network on the 50000 test images: 82.0%
[06:34:43.466414] Max accuracy: 82.02%
[06:34:43.553902] log_dir: ./output_dir_qkformer
[06:34:46.144161] Epoch: [190]  [   0/5004]  eta: 3:35:54  lr: 0.000009  loss: 2.6730 (2.6730)  time: 2.5888  data: 2.1189  max mem: 24440
[06:50:06.622565] Epoch: [190]  [2000/5004]  eta: 0:23:05  lr: 0.000008  loss: 2.0751 (2.1313)  time: 0.4591  data: 0.0002  max mem: 24440
[07:05:27.779294] Epoch: [190]  [4000/5004]  eta: 0:07:42  lr: 0.000008  loss: 2.0185 (2.1246)  time: 0.4624  data: 0.0002  max mem: 24440
[07:13:09.321581] Epoch: [190]  [5003/5004]  eta: 0:00:00  lr: 0.000007  loss: 2.1530 (2.1266)  time: 0.4541  data: 0.0009  max mem: 24440
[07:13:09.798846] Epoch: [190] Total time: 0:38:26 (0.4609 s / it)
[07:13:09.861860] Averaged stats: lr: 0.000007  loss: 2.1530 (2.1236)
[07:13:12.107889] Test:  [   0/1563]  eta: 0:58:21  loss: 0.2767 (0.2767)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.2399  data: 1.8010  max mem: 24440
[07:14:36.114106] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4716 (0.5569)  acc1: 87.5000 (85.7410)  acc5: 96.8750 (97.8293)  time: 0.1680  data: 0.0002  max mem: 24440
[07:16:00.147091] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8289 (0.6594)  acc1: 71.8750 (83.2355)  acc5: 96.8750 (96.7626)  time: 0.1679  data: 0.0002  max mem: 24440
[07:17:24.164781] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3497 (0.7124)  acc1: 90.6250 (82.0162)  acc5: 100.0000 (96.2171)  time: 0.1679  data: 0.0002  max mem: 24440
[07:17:34.495580] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3460 (0.7152)  acc1: 93.7500 (81.8980)  acc5: 100.0000 (96.2280)  time: 0.1635  data: 0.0001  max mem: 24440
[07:17:34.620876] Test: Total time: 0:04:24 (0.1694 s / it)
[07:17:35.176619] * Acc@1 81.900 Acc@5 96.230 loss 0.715
[07:17:35.176809] Accuracy of the network on the 50000 test images: 81.9%
[07:17:35.176834] Max accuracy: 82.02%
[07:17:35.241539] log_dir: ./output_dir_qkformer
[07:17:37.925282] Epoch: [191]  [   0/5004]  eta: 3:43:35  lr: 0.000007  loss: 2.3197 (2.3197)  time: 2.6810  data: 2.1215  max mem: 24440
[07:32:59.090509] Epoch: [191]  [2000/5004]  eta: 0:23:06  lr: 0.000007  loss: 2.0230 (2.1274)  time: 0.4608  data: 0.0002  max mem: 24440
[07:48:19.244835] Epoch: [191]  [4000/5004]  eta: 0:07:42  lr: 0.000006  loss: 2.0863 (2.1282)  time: 0.4604  data: 0.0002  max mem: 24440
[07:56:00.219097] Epoch: [191]  [5003/5004]  eta: 0:00:00  lr: 0.000006  loss: 2.1494 (2.1270)  time: 0.4543  data: 0.0009  max mem: 24440
[07:56:00.622776] Epoch: [191] Total time: 0:38:25 (0.4607 s / it)
[07:56:00.645841] Averaged stats: lr: 0.000006  loss: 2.1494 (2.1215)
[07:56:03.008609] Test:  [   0/1563]  eta: 1:01:23  loss: 0.2816 (0.2816)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.3570  data: 1.9520  max mem: 24440
[07:57:27.010593] Test:  [ 500/1563]  eta: 0:03:03  loss: 0.5000 (0.5570)  acc1: 87.5000 (85.8720)  acc5: 96.8750 (97.8855)  time: 0.1684  data: 0.0006  max mem: 24440
[07:58:51.017187] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8666 (0.6580)  acc1: 75.0000 (83.4228)  acc5: 96.8750 (96.7595)  time: 0.1679  data: 0.0002  max mem: 24440
[08:00:15.044644] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3994 (0.7133)  acc1: 90.6250 (82.0474)  acc5: 100.0000 (96.2275)  time: 0.1678  data: 0.0002  max mem: 24440
[08:00:25.370653] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3200 (0.7160)  acc1: 93.7500 (81.9340)  acc5: 100.0000 (96.2480)  time: 0.1635  data: 0.0001  max mem: 24440
[08:00:25.495257] Test: Total time: 0:04:24 (0.1694 s / it)
[08:00:26.269562] * Acc@1 81.930 Acc@5 96.250 loss 0.716
[08:00:26.269717] Accuracy of the network on the 50000 test images: 81.9%
[08:00:26.269741] Max accuracy: 82.02%
[08:00:26.342983] log_dir: ./output_dir_qkformer
[08:00:29.158396] Epoch: [192]  [   0/5004]  eta: 3:54:23  lr: 0.000006  loss: 1.9818 (1.9818)  time: 2.8104  data: 2.0451  max mem: 24440
[08:15:51.127364] Epoch: [192]  [2000/5004]  eta: 0:23:08  lr: 0.000005  loss: 1.8997 (2.1168)  time: 0.4640  data: 0.0002  max mem: 24440
[08:31:12.756667] Epoch: [192]  [4000/5004]  eta: 0:07:43  lr: 0.000005  loss: 2.0461 (2.1171)  time: 0.4577  data: 0.0002  max mem: 24440
[08:38:54.348743] Epoch: [192]  [5003/5004]  eta: 0:00:00  lr: 0.000005  loss: 2.0665 (2.1206)  time: 0.4580  data: 0.0009  max mem: 24440
[08:38:54.787484] Epoch: [192] Total time: 0:38:28 (0.4613 s / it)
[08:38:54.791464] Averaged stats: lr: 0.000005  loss: 2.0665 (2.1198)
[08:38:56.386614] Test:  [   0/1563]  eta: 0:41:26  loss: 0.2682 (0.2682)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.5910  data: 1.4142  max mem: 24440
[08:40:20.406695] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5246 (0.5609)  acc1: 84.3750 (85.7348)  acc5: 96.8750 (97.8418)  time: 0.1678  data: 0.0002  max mem: 24440
[08:41:44.437023] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8659 (0.6620)  acc1: 78.1250 (83.3885)  acc5: 96.8750 (96.7283)  time: 0.1681  data: 0.0002  max mem: 24440
[08:43:08.457443] Test:  [1500/1563]  eta: 0:00:10  loss: 0.4454 (0.7158)  acc1: 90.6250 (82.0786)  acc5: 100.0000 (96.2150)  time: 0.1679  data: 0.0002  max mem: 24440
[08:43:18.788108] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3363 (0.7185)  acc1: 93.7500 (81.9700)  acc5: 100.0000 (96.2320)  time: 0.1635  data: 0.0001  max mem: 24440
[08:43:18.920158] Test: Total time: 0:04:24 (0.1690 s / it)
[08:43:19.295609] * Acc@1 81.969 Acc@5 96.234 loss 0.719
[08:43:19.295796] Accuracy of the network on the 50000 test images: 82.0%
[08:43:19.295824] Max accuracy: 82.02%
[08:43:19.362945] log_dir: ./output_dir_qkformer
[08:43:22.429007] Epoch: [193]  [   0/5004]  eta: 4:14:35  lr: 0.000005  loss: 2.0482 (2.0482)  time: 3.0527  data: 2.5954  max mem: 24440
[08:58:44.630875] Epoch: [193]  [2000/5004]  eta: 0:23:08  lr: 0.000004  loss: 2.0705 (2.1252)  time: 0.4617  data: 0.0002  max mem: 24440
[09:14:07.382104] Epoch: [193]  [4000/5004]  eta: 0:07:43  lr: 0.000004  loss: 2.0164 (2.1247)  time: 0.4628  data: 0.0003  max mem: 24440
[09:21:49.387894] Epoch: [193]  [5003/5004]  eta: 0:00:00  lr: 0.000004  loss: 2.0722 (2.1246)  time: 0.4538  data: 0.0006  max mem: 24440
[09:21:49.822029] Epoch: [193] Total time: 0:38:30 (0.4617 s / it)
[09:21:49.887298] Averaged stats: lr: 0.000004  loss: 2.0722 (2.1172)
[09:21:51.919373] Test:  [   0/1563]  eta: 0:52:49  loss: 0.2552 (0.2552)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.0277  data: 1.8544  max mem: 24440
[09:23:15.909628] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4905 (0.5610)  acc1: 87.5000 (85.6100)  acc5: 100.0000 (97.7483)  time: 0.1680  data: 0.0002  max mem: 24440
[09:24:39.891264] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7964 (0.6610)  acc1: 81.2500 (83.2261)  acc5: 96.8750 (96.7470)  time: 0.1678  data: 0.0002  max mem: 24440
[09:26:03.923005] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3442 (0.7157)  acc1: 90.6250 (81.9225)  acc5: 100.0000 (96.2025)  time: 0.1681  data: 0.0002  max mem: 24440
[09:26:14.249660] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3389 (0.7181)  acc1: 90.6250 (81.8480)  acc5: 100.0000 (96.2140)  time: 0.1635  data: 0.0001  max mem: 24440
[09:26:14.373842] Test: Total time: 0:04:24 (0.1692 s / it)
[09:26:15.068838] * Acc@1 81.848 Acc@5 96.216 loss 0.718
[09:26:15.068995] Accuracy of the network on the 50000 test images: 81.8%
[09:26:15.069017] Max accuracy: 82.02%
[09:26:15.127806] log_dir: ./output_dir_qkformer
[09:26:17.775181] Epoch: [194]  [   0/5004]  eta: 3:40:41  lr: 0.000004  loss: 2.3580 (2.3580)  time: 2.6463  data: 2.1518  max mem: 24440
[09:41:39.211002] Epoch: [194]  [2000/5004]  eta: 0:23:07  lr: 0.000003  loss: 2.0965 (2.1207)  time: 0.4656  data: 0.0003  max mem: 24440
[09:56:59.649355] Epoch: [194]  [4000/5004]  eta: 0:07:42  lr: 0.000003  loss: 2.1147 (2.1226)  time: 0.4583  data: 0.0002  max mem: 24440
[10:04:41.382792] Epoch: [194]  [5003/5004]  eta: 0:00:00  lr: 0.000003  loss: 2.1779 (2.1172)  time: 0.4541  data: 0.0005  max mem: 24440
[10:04:41.826451] Epoch: [194] Total time: 0:38:26 (0.4610 s / it)
[10:04:41.829415] Averaged stats: lr: 0.000003  loss: 2.1779 (2.1151)
[10:04:43.704933] Test:  [   0/1563]  eta: 0:48:45  loss: 0.2038 (0.2038)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.8717  data: 1.6984  max mem: 24440
[10:06:07.997827] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.4452 (0.5525)  acc1: 87.5000 (85.9531)  acc5: 100.0000 (97.9167)  time: 0.1679  data: 0.0002  max mem: 24440
[10:07:32.014373] Test:  [1000/1563]  eta: 0:01:35  loss: 0.9054 (0.6583)  acc1: 75.0000 (83.4259)  acc5: 96.8750 (96.7751)  time: 0.1678  data: 0.0002  max mem: 24440
[10:08:56.031700] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3254 (0.7133)  acc1: 87.5000 (82.0474)  acc5: 100.0000 (96.2483)  time: 0.1678  data: 0.0002  max mem: 24440
[10:09:06.357352] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3110 (0.7164)  acc1: 93.7500 (81.9420)  acc5: 100.0000 (96.2680)  time: 0.1635  data: 0.0001  max mem: 24440
[10:09:06.482391] Test: Total time: 0:04:24 (0.1693 s / it)
[10:09:07.165469] * Acc@1 81.944 Acc@5 96.267 loss 0.716
[10:09:07.165620] Accuracy of the network on the 50000 test images: 81.9%
[10:09:07.165642] Max accuracy: 82.02%
[10:09:07.230910] log_dir: ./output_dir_qkformer
[10:09:09.910319] Epoch: [195]  [   0/5004]  eta: 3:43:21  lr: 0.000003  loss: 2.0526 (2.0526)  time: 2.6781  data: 1.8888  max mem: 24440
[10:24:31.704026] Epoch: [195]  [2000/5004]  eta: 0:23:07  lr: 0.000003  loss: 2.1486 (2.1072)  time: 0.4628  data: 0.0003  max mem: 24440
[10:39:53.392134] Epoch: [195]  [4000/5004]  eta: 0:07:43  lr: 0.000002  loss: 2.0989 (2.1115)  time: 0.4611  data: 0.0002  max mem: 24440
[10:47:35.372416] Epoch: [195]  [5003/5004]  eta: 0:00:00  lr: 0.000002  loss: 2.0585 (2.1127)  time: 0.4528  data: 0.0009  max mem: 24440
[10:47:35.870150] Epoch: [195] Total time: 0:38:28 (0.4614 s / it)
[10:47:35.872288] Averaged stats: lr: 0.000002  loss: 2.0585 (2.1170)
[10:47:37.621544] Test:  [   0/1563]  eta: 0:45:27  loss: 0.3196 (0.3196)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7452  data: 1.5676  max mem: 24440
[10:49:01.606697] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.4583 (0.5544)  acc1: 87.5000 (85.7473)  acc5: 96.8750 (97.8106)  time: 0.1680  data: 0.0002  max mem: 24440
[10:50:25.589406] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8404 (0.6549)  acc1: 75.0000 (83.3854)  acc5: 93.7500 (96.6908)  time: 0.1678  data: 0.0002  max mem: 24440
[10:51:49.628486] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3402 (0.7105)  acc1: 87.5000 (82.0120)  acc5: 100.0000 (96.1651)  time: 0.1680  data: 0.0002  max mem: 24440
[10:51:59.953409] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3323 (0.7128)  acc1: 93.7500 (81.9360)  acc5: 100.0000 (96.1940)  time: 0.1635  data: 0.0001  max mem: 24440
[10:52:00.081046] Test: Total time: 0:04:24 (0.1690 s / it)
[10:52:00.766204] * Acc@1 81.943 Acc@5 96.194 loss 0.713
[10:52:00.766405] Accuracy of the network on the 50000 test images: 81.9%
[10:52:00.766428] Max accuracy: 82.02%
[10:52:00.831298] log_dir: ./output_dir_qkformer
[10:52:03.406364] Epoch: [196]  [   0/5004]  eta: 3:34:23  lr: 0.000002  loss: 1.6424 (1.6424)  time: 2.5706  data: 2.0911  max mem: 24440
[11:07:27.559580] Epoch: [196]  [2000/5004]  eta: 0:23:11  lr: 0.000002  loss: 2.1092 (2.1198)  time: 0.4637  data: 0.0003  max mem: 24440
[11:22:50.374248] Epoch: [196]  [4000/5004]  eta: 0:07:44  lr: 0.000002  loss: 2.0369 (2.1216)  time: 0.4620  data: 0.0002  max mem: 24440
[11:30:32.422051] Epoch: [196]  [5003/5004]  eta: 0:00:00  lr: 0.000002  loss: 2.0853 (2.1191)  time: 0.4539  data: 0.0006  max mem: 24440
[11:30:32.937341] Epoch: [196] Total time: 0:38:32 (0.4621 s / it)
[11:30:32.938975] Averaged stats: lr: 0.000002  loss: 2.0853 (2.1151)
[11:30:35.064089] Test:  [   0/1563]  eta: 0:55:06  loss: 0.2676 (0.2676)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 2.1152  data: 1.9371  max mem: 24440
[11:31:59.105985] Test:  [ 500/1563]  eta: 0:03:02  loss: 0.5212 (0.5517)  acc1: 87.5000 (86.0716)  acc5: 100.0000 (97.9229)  time: 0.1679  data: 0.0002  max mem: 24440
[11:33:23.181672] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8608 (0.6544)  acc1: 75.0000 (83.5446)  acc5: 96.8750 (96.7751)  time: 0.1681  data: 0.0002  max mem: 24440
[11:34:47.259045] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3521 (0.7089)  acc1: 87.5000 (82.1827)  acc5: 100.0000 (96.2317)  time: 0.1685  data: 0.0002  max mem: 24440
[11:34:57.595163] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3158 (0.7118)  acc1: 93.7500 (82.0840)  acc5: 100.0000 (96.2380)  time: 0.1636  data: 0.0001  max mem: 24440
[11:34:57.713496] Test: Total time: 0:04:24 (0.1694 s / it)
[11:34:58.182216] * Acc@1 82.083 Acc@5 96.239 loss 0.712
[11:34:58.182371] Accuracy of the network on the 50000 test images: 82.1%
[11:35:01.414383] Max accuracy: 82.08%
[11:35:01.435146] log_dir: ./output_dir_qkformer
[11:35:04.089607] Epoch: [197]  [   0/5004]  eta: 3:41:17  lr: 0.000002  loss: 2.0451 (2.0451)  time: 2.6533  data: 2.1881  max mem: 24440
[11:50:27.602270] Epoch: [197]  [2000/5004]  eta: 0:23:10  lr: 0.000002  loss: 2.0442 (2.1112)  time: 0.4626  data: 0.0002  max mem: 24440
[12:05:49.932417] Epoch: [197]  [4000/5004]  eta: 0:07:43  lr: 0.000001  loss: 2.0985 (2.1179)  time: 0.4594  data: 0.0002  max mem: 24440
[12:13:32.948634] Epoch: [197]  [5003/5004]  eta: 0:00:00  lr: 0.000001  loss: 2.0467 (2.1149)  time: 0.4537  data: 0.0009  max mem: 24440
[12:13:33.452349] Epoch: [197] Total time: 0:38:32 (0.4620 s / it)
[12:13:33.455644] Averaged stats: lr: 0.000001  loss: 2.0467 (2.1136)
[12:13:35.163057] Test:  [   0/1563]  eta: 0:44:18  loss: 0.2758 (0.2758)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.7011  data: 1.5186  max mem: 24440
[12:14:59.192825] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5546 (0.5576)  acc1: 87.5000 (85.9469)  acc5: 96.8750 (97.8106)  time: 0.1680  data: 0.0002  max mem: 24440
[12:16:23.226168] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8862 (0.6594)  acc1: 75.0000 (83.4010)  acc5: 96.8750 (96.7408)  time: 0.1682  data: 0.0002  max mem: 24440
[12:17:47.276722] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3636 (0.7148)  acc1: 90.6250 (82.0599)  acc5: 100.0000 (96.2088)  time: 0.1679  data: 0.0002  max mem: 24440
[12:17:57.610542] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3310 (0.7171)  acc1: 93.7500 (81.9720)  acc5: 100.0000 (96.2200)  time: 0.1637  data: 0.0001  max mem: 24440
[12:17:57.751610] Test: Total time: 0:04:24 (0.1691 s / it)
[12:17:58.114916] * Acc@1 81.976 Acc@5 96.224 loss 0.717
[12:17:58.115075] Accuracy of the network on the 50000 test images: 82.0%
[12:17:58.115097] Max accuracy: 82.08%
[12:17:58.190181] log_dir: ./output_dir_qkformer
[12:18:00.824942] Epoch: [198]  [   0/5004]  eta: 3:39:33  lr: 0.000001  loss: 2.6317 (2.6317)  time: 2.6325  data: 1.9075  max mem: 24440
[12:33:22.275093] Epoch: [198]  [2000/5004]  eta: 0:23:07  lr: 0.000001  loss: 2.0290 (2.1170)  time: 0.4592  data: 0.0002  max mem: 24440
[12:48:42.214640] Epoch: [198]  [4000/5004]  eta: 0:07:42  lr: 0.000001  loss: 2.0903 (2.1187)  time: 0.4590  data: 0.0002  max mem: 24440
[12:56:23.424536] Epoch: [198]  [5003/5004]  eta: 0:00:00  lr: 0.000001  loss: 2.1217 (2.1157)  time: 0.4532  data: 0.0006  max mem: 24440
[12:56:23.850493] Epoch: [198] Total time: 0:38:25 (0.4608 s / it)
[12:56:23.863201] Averaged stats: lr: 0.000001  loss: 2.1217 (2.1119)
[12:56:25.480399] Test:  [   0/1563]  eta: 0:41:59  loss: 0.2944 (0.2944)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6122  data: 1.4370  max mem: 24440
[12:57:49.468988] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5434 (0.5548)  acc1: 87.5000 (85.7348)  acc5: 96.8750 (97.9042)  time: 0.1680  data: 0.0002  max mem: 24440
[12:59:13.496912] Test:  [1000/1563]  eta: 0:01:35  loss: 0.8198 (0.6592)  acc1: 78.1250 (83.3385)  acc5: 96.8750 (96.7314)  time: 0.1678  data: 0.0002  max mem: 24440
[13:00:37.503968] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3683 (0.7133)  acc1: 90.6250 (82.0266)  acc5: 100.0000 (96.2296)  time: 0.1681  data: 0.0002  max mem: 24440
[13:00:47.832864] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3213 (0.7164)  acc1: 93.7500 (81.9380)  acc5: 100.0000 (96.2420)  time: 0.1636  data: 0.0001  max mem: 24440
[13:00:47.951814] Test: Total time: 0:04:24 (0.1690 s / it)
[13:00:48.399828] * Acc@1 81.936 Acc@5 96.243 loss 0.716
[13:00:48.400000] Accuracy of the network on the 50000 test images: 81.9%
[13:00:48.400021] Max accuracy: 82.08%
[13:00:48.506618] log_dir: ./output_dir_qkformer
[13:00:51.057330] Epoch: [199]  [   0/5004]  eta: 3:32:33  lr: 0.000001  loss: 2.0415 (2.0415)  time: 2.5487  data: 1.9346  max mem: 24440
[13:16:12.427205] Epoch: [199]  [2000/5004]  eta: 0:23:06  lr: 0.000001  loss: 2.2024 (2.1114)  time: 0.4590  data: 0.0002  max mem: 24440
[13:31:33.348068] Epoch: [199]  [4000/5004]  eta: 0:07:42  lr: 0.000001  loss: 2.0541 (2.1135)  time: 0.4598  data: 0.0002  max mem: 24440
[13:39:14.856047] Epoch: [199]  [5003/5004]  eta: 0:00:00  lr: 0.000001  loss: 1.9873 (2.1153)  time: 0.4536  data: 0.0006  max mem: 24440
[13:39:15.308799] Epoch: [199] Total time: 0:38:26 (0.4610 s / it)
[13:39:15.309950] Averaged stats: lr: 0.000001  loss: 1.9873 (2.1139)
[13:39:16.953180] Test:  [   0/1563]  eta: 0:42:40  loss: 0.2970 (0.2970)  acc1: 96.8750 (96.8750)  acc5: 96.8750 (96.8750)  time: 1.6380  data: 1.4597  max mem: 24440
[13:40:40.964909] Test:  [ 500/1563]  eta: 0:03:01  loss: 0.5262 (0.5469)  acc1: 87.5000 (85.8471)  acc5: 96.8750 (97.8169)  time: 0.1680  data: 0.0002  max mem: 24440
[13:42:04.959464] Test:  [1000/1563]  eta: 0:01:35  loss: 0.7760 (0.6524)  acc1: 75.0000 (83.3729)  acc5: 96.8750 (96.7064)  time: 0.1682  data: 0.0002  max mem: 24440
[13:43:28.949690] Test:  [1500/1563]  eta: 0:00:10  loss: 0.3733 (0.7069)  acc1: 87.5000 (82.1161)  acc5: 100.0000 (96.2004)  time: 0.1678  data: 0.0002  max mem: 24440
[13:43:39.271977] Test:  [1562/1563]  eta: 0:00:00  loss: 0.3621 (0.7102)  acc1: 93.7500 (82.0000)  acc5: 100.0000 (96.2240)  time: 0.1635  data: 0.0001  max mem: 24440
[13:43:39.405575] Test: Total time: 0:04:24 (0.1690 s / it)
[13:43:40.002765] * Acc@1 81.998 Acc@5 96.223 loss 0.710
[13:43:40.002914] Accuracy of the network on the 50000 test images: 82.0%
[13:43:40.002937] Max accuracy: 82.08%
[13:43:40.066094] Training time 5 days, 22:52:42
