/userhome/miniconda3/envs/mae/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
| distributed init (rank 1): env://, gpu 1
| distributed init (rank 4): env://, gpu 4
| distributed init (rank 6): env://, gpu 6
| distributed init (rank 2): env://, gpu 2
| distributed init (rank 3): env://, gpu 3
| distributed init (rank 0): env://, gpu 0
| distributed init (rank 5): env://, gpu 5
| distributed init (rank 7): env://, gpu 7
[09:32:48.655640] job dir: /userhome/mae-tmp-v1
[09:32:48.655856] Namespace(aa='rand-m9-mstd0.5-inc1',
accum_iter=3,
batch_size=24,
blr=0.0006,
clip_grad=None,
color_jitter=None,
cutmix=0,
cutmix_minmax=None,
data_path='/dataset/ImageNet2012',
device='cuda',
dist_backend='nccl',
dist_eval=False,
dist_on_itp=False,
dist_url='env://',
distributed=True,
drop_path=0.1,
epochs=200,
eval=False,
finetune='',
global_pool=True,
gpu=0,
input_size=224,
layer_decay=1.0,
local_rank=0,
log_dir='./output_dir_qkformer',
lr=None,
min_lr=1e-06,
mixup=0,
mixup_mode='batch',
mixup_prob=1.0,
mixup_switch_prob=0.5,
model='QKFormer_10_768',
nb_classes=1000,
num_workers=10,
output_dir='./output_dir_qkformer',
pin_mem=True,
rank=0,
recount=1,
remode='pixel',
reprob=0.25,
resplit=False,
resume='',
seed=0,
smoothing=0.1,
start_epoch=0,
time_step=4,
warmup_epochs=5,
weight_decay=0.05,
world_size=8)
[09:32:54.920204] Dataset ImageFolder
    Number of datapoints: 1281167
    Root location: /dataset/ImageNet2012/train
    StandardTransform
Transform: Compose(
               RandomResizedCropAndInterpolation(size=(224, 224), scale=(0.08, 1.0), ratio=(0.75, 1.3333), interpolation=PIL.Image.BICUBIC)
               RandomHorizontalFlip(p=0.5)
               <timm.data.auto_augment.RandAugment object at 0x7f84214f4fa0>
               ToTensor()
               Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
               <timm.data.random_erasing.RandomErasing object at 0x7f84193ff280>
           )
[09:32:56.490730] Dataset ImageFolder
    Number of datapoints: 50000
    Root location: /dataset/ImageNet2012/val
    StandardTransform
Transform: Compose(
               Resize(size=256, interpolation=bicubic, max_size=None, antialias=None)
               CenterCrop(size=(224, 224))
               ToTensor()
               Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
           )
[09:32:56.491409] Sampler_train = <torch.utils.data.distributed.DistributedSampler object at 0x7f844a5d8df0>
[09:32:57.380418] Model = spiking_transformer(
  (patch_embed1): PatchEmbedInit(
    (proj_conv): Conv2d(3, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj_bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj1_conv): Conv2d(96, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj1_bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj1_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj1_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj2_conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj2_bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj2_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj_res_conv): Conv2d(96, 192, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (proj_res_bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_res_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
  )
  (patch_embed2): PatchEmbeddingStage(
    (proj3_conv): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj3_bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj3_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj3_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj4_conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj4_bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj4_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj_res_conv): Conv2d(192, 384, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (proj_res_bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_res_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
  )
  (patch_embed3): PatchEmbeddingStage(
    (proj3_conv): Conv2d(384, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj3_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj3_maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (proj3_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj4_conv): Conv2d(768, 768, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (proj4_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj4_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
    (proj_res_conv): Conv2d(384, 768, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (proj_res_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (proj_res_lif): MultiStepLIFNode(
      v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
      (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
    )
  )
  (stage1): ModuleList(
    (0): TokenSpikingTransformer(
      (tssa): Token_QK_Attention(
        (q_conv): Conv1d(192, 192, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(192, 192, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(192, 192, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(192, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
  )
  (stage2): ModuleList(
    (0): TokenSpikingTransformer(
      (tssa): Token_QK_Attention(
        (q_conv): Conv1d(384, 384, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(384, 384, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(384, 384, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (1): TokenSpikingTransformer(
      (tssa): Token_QK_Attention(
        (q_conv): Conv1d(384, 384, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(384, 384, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(384, 384, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(1536, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
  )
  (stage3): ModuleList(
    (0): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (1): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (2): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (3): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (4): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (5): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
    (6): SpikingTransformer(
      (attn): Spiking_Self_Attention(
        (q_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (q_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (q_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (k_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (k_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (k_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (v_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,), bias=False)
        (v_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (v_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (attn_lif): MultiStepLIFNode(
          v_threshold=0.5, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (proj_conv): Conv1d(768, 768, kernel_size=(1,), stride=(1,))
        (proj_bn): BatchNorm1d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (proj_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (qkv_mp): MaxPool1d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      )
      (mlp): MLP(
        (fc1_conv): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
        (fc1_bn): BatchNorm2d(3072, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc1_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
        (fc2_conv): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
        (fc2_bn): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (fc2_lif): MultiStepLIFNode(
          v_threshold=1.0, v_reset=0.0, detach_reset=True, tau=2.0, backend=cupy
          (surrogate_function): Sigmoid(alpha=4.0, spiking=True)
        )
      )
    )
  )
  (head): Linear(in_features=768, out_features=1000, bias=True)
)
[09:32:57.380509] number of params (M): 64.96
[09:32:57.380532] base lr: 6.00e-04
[09:32:57.380593] actual lr: 1.35e-03
[09:32:57.380603] accumulate grad iterations: 3
[09:32:57.380613] effective batch size: 576
[09:32:57.614407] criterion = LabelSmoothingCrossEntropy()
[09:32:57.614477] Start training for 200 epochs
[09:32:57.616024] log_dir: ./output_dir_qkformer
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1251] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[09:33:36.526323] Epoch: [0]  [   0/6672]  eta: 2 days, 23:59:57  lr: 0.000000  loss: 7.1341 (7.1341)  time: 38.8485  data: 2.5770  max mem: 30335
[09:58:04.003052] Epoch: [0]  [2000/6672]  eta: 0:58:36  lr: 0.000081  loss: 6.6311 (6.8294)  time: 0.7203  data: 0.0003  max mem: 30335
[10:23:00.467779] Epoch: [0]  [4000/6672]  eta: 0:33:24  lr: 0.000162  loss: 6.1546 (6.6275)  time: 0.7557  data: 0.0002  max mem: 30335
[10:47:51.742866] Epoch: [0]  [6000/6672]  eta: 0:08:23  lr: 0.000243  loss: 5.9000 (6.4463)  time: 0.7237  data: 0.0002  max mem: 30335
[10:56:07.803940] Epoch: [0]  [6671/6672]  eta: 0:00:00  lr: 0.000270  loss: 5.8258 (6.3910)  time: 0.7266  data: 0.0011  max mem: 30335
[10:56:08.758805] Epoch: [0] Total time: 1:23:11 (0.7481 s / it)
[10:56:08.788999] Averaged stats: lr: 0.000270  loss: 5.8258 (6.3925)
[10:56:13.223945] Test:  [   0/2084]  eta: 2:33:51  loss: 5.5401 (5.5401)  acc1: 4.1667 (4.1667)  acc5: 8.3333 (8.3333)  time: 4.4299  data: 3.7497  max mem: 30335
[10:58:21.558898] Test:  [ 500/2084]  eta: 0:06:59  loss: 4.0710 (5.1121)  acc1: 4.1667 (6.5785)  acc5: 33.3333 (19.8852)  time: 0.2534  data: 0.0002  max mem: 30335
[11:00:28.571989] Test:  [1000/2084]  eta: 0:04:41  loss: 5.0107 (5.1148)  acc1: 4.1667 (6.5268)  acc5: 20.8333 (19.7303)  time: 0.2542  data: 0.0002  max mem: 30335
[11:02:37.062246] Test:  [1500/2084]  eta: 0:02:31  loss: 5.0986 (5.1512)  acc1: 0.0000 (6.5318)  acc5: 8.3333 (19.2844)  time: 0.2531  data: 0.0002  max mem: 30335
[11:04:44.054692] Test:  [2000/2084]  eta: 0:00:21  loss: 5.0754 (5.1721)  acc1: 0.0000 (6.5155)  acc5: 20.8333 (19.3153)  time: 0.2536  data: 0.0002  max mem: 30335
[11:05:07.848069] Test:  [2083/2084]  eta: 0:00:00  loss: 6.2905 (5.1721)  acc1: 0.0000 (6.6280)  acc5: 4.1667 (19.5000)  time: 0.3892  data: 0.0001  max mem: 30335
[11:05:07.956556] Test: Total time: 0:08:59 (0.2587 s / it)
[11:05:19.974036] * Acc@1 6.627 Acc@5 19.490 loss 5.172
[11:05:19.974432] Accuracy of the network on the 50000 test images: 6.6%
[11:05:19.974517] Max accuracy: 6.63%
[11:05:20.100079] log_dir: ./output_dir_qkformer
[11:05:29.027339] Epoch: [1]  [   0/6672]  eta: 16:28:37  lr: 0.000270  loss: 5.6103 (5.6103)  time: 8.8904  data: 2.5924  max mem: 30335
[11:30:33.769127] Epoch: [1]  [2000/6672]  eta: 0:58:53  lr: 0.000351  loss: 5.5435 (5.7078)  time: 0.7220  data: 0.0003  max mem: 30335
[11:55:27.933485] Epoch: [1]  [4000/6672]  eta: 0:33:28  lr: 0.000432  loss: 5.3348 (5.6036)  time: 0.7194  data: 0.0003  max mem: 30335
[12:20:03.707951] Epoch: [1]  [6000/6672]  eta: 0:08:21  lr: 0.000513  loss: 5.1790 (5.5402)  time: 0.7227  data: 0.0003  max mem: 30335
[12:28:14.373139] Epoch: [1]  [6671/6672]  eta: 0:00:00  lr: 0.000540  loss: 5.2814 (5.5097)  time: 0.7171  data: 0.0007  max mem: 30335
[12:28:15.207419] Epoch: [1] Total time: 1:22:55 (0.7457 s / it)
[12:28:15.235661] Averaged stats: lr: 0.000540  loss: 5.2814 (5.5110)
[12:28:19.052109] Test:  [   0/2084]  eta: 2:12:23  loss: 1.8219 (1.8219)  acc1: 70.8333 (70.8333)  acc5: 87.5000 (87.5000)  time: 3.8117  data: 3.2134  max mem: 30335
[12:30:26.611608] Test:  [ 500/2084]  eta: 0:06:55  loss: 3.5386 (3.6278)  acc1: 4.1667 (21.6484)  acc5: 45.8333 (47.7212)  time: 0.2546  data: 0.0002  max mem: 30335
[12:32:33.864994] Test:  [1000/2084]  eta: 0:04:40  loss: 3.8632 (3.6941)  acc1: 12.5000 (21.1122)  acc5: 41.6667 (46.8240)  time: 0.2545  data: 0.0002  max mem: 30335
[12:34:41.240935] Test:  [1500/2084]  eta: 0:02:30  loss: 4.4007 (3.8303)  acc1: 8.3333 (20.0561)  acc5: 29.1667 (44.3899)  time: 0.2546  data: 0.0002  max mem: 30335
[12:36:48.460097] Test:  [2000/2084]  eta: 0:00:21  loss: 3.2695 (3.9221)  acc1: 29.1667 (19.2904)  acc5: 58.3333 (42.7869)  time: 0.2545  data: 0.0002  max mem: 30335
[12:37:09.403258] Test:  [2083/2084]  eta: 0:00:00  loss: 2.8284 (3.9071)  acc1: 37.5000 (19.7240)  acc5: 62.5000 (43.1620)  time: 0.2468  data: 0.0001  max mem: 30335
[12:37:09.520488] Test: Total time: 0:08:54 (0.2564 s / it)
[12:37:23.434418] * Acc@1 19.727 Acc@5 43.157 loss 3.907
[12:37:23.434607] Accuracy of the network on the 50000 test images: 19.7%
[12:37:23.434639] Max accuracy: 19.73%
[12:37:23.519882] log_dir: ./output_dir_qkformer
[12:37:27.162432] Epoch: [2]  [   0/6672]  eta: 6:44:32  lr: 0.000540  loss: 5.1226 (5.1226)  time: 3.6380  data: 2.0275  max mem: 30335
[13:02:08.974655] Epoch: [2]  [2000/6672]  eta: 0:57:47  lr: 0.000621  loss: 4.9487 (5.1050)  time: 0.7224  data: 0.0004  max mem: 30335
[13:27:02.204159] Epoch: [2]  [4000/6672]  eta: 0:33:08  lr: 0.000702  loss: 4.9151 (5.0570)  time: 0.7197  data: 0.0003  max mem: 30335
[13:51:52.192261] Epoch: [2]  [6000/6672]  eta: 0:08:20  lr: 0.000783  loss: 4.7459 (5.0007)  time: 0.7204  data: 0.0002  max mem: 30335
[14:00:11.428495] Epoch: [2]  [6671/6672]  eta: 0:00:00  lr: 0.000810  loss: 4.6054 (4.9774)  time: 0.7158  data: 0.0011  max mem: 30335
[14:00:12.361396] Epoch: [2] Total time: 1:22:48 (0.7447 s / it)
[14:00:12.401626] Averaged stats: lr: 0.000810  loss: 4.6054 (4.9800)
[14:00:16.484926] Test:  [   0/2084]  eta: 2:21:39  loss: 2.2535 (2.2535)  acc1: 54.1667 (54.1667)  acc5: 79.1667 (79.1667)  time: 4.0785  data: 3.4475  max mem: 30335
[14:02:25.265177] Test:  [ 500/2084]  eta: 0:07:00  loss: 2.8157 (2.8880)  acc1: 16.6667 (31.3207)  acc5: 66.6667 (62.0093)  time: 0.2540  data: 0.0002  max mem: 30335
[14:04:32.536914] Test:  [1000/2084]  eta: 0:04:41  loss: 3.3826 (2.9854)  acc1: 29.1667 (31.4352)  acc5: 50.0000 (60.6810)  time: 0.2547  data: 0.0002  max mem: 30335
[14:06:39.769990] Test:  [1500/2084]  eta: 0:02:30  loss: 4.0143 (3.1672)  acc1: 16.6667 (30.0522)  acc5: 50.0000 (57.9586)  time: 0.2548  data: 0.0002  max mem: 30335
[14:08:47.388602] Test:  [2000/2084]  eta: 0:00:21  loss: 2.5304 (3.2465)  acc1: 41.6667 (29.6102)  acc5: 66.6667 (56.6071)  time: 0.2541  data: 0.0002  max mem: 30335
[14:09:08.350861] Test:  [2083/2084]  eta: 0:00:00  loss: 2.2486 (3.2250)  acc1: 50.0000 (30.0780)  acc5: 79.1667 (57.0320)  time: 0.2470  data: 0.0002  max mem: 30335
[14:09:08.489291] Test: Total time: 0:08:56 (0.2572 s / it)
[14:09:22.778756] * Acc@1 30.085 Acc@5 57.020 loss 3.225
[14:09:22.779065] Accuracy of the network on the 50000 test images: 30.1%
[14:09:22.779098] Max accuracy: 30.08%
[14:09:23.187103] log_dir: ./output_dir_qkformer
[14:09:28.684152] Epoch: [3]  [   0/6672]  eta: 9:40:28  lr: 0.000810  loss: 4.8833 (4.8833)  time: 5.2202  data: 2.1375  max mem: 30335
[14:34:33.382222] Epoch: [3]  [2000/6672]  eta: 0:58:44  lr: 0.000891  loss: 4.7535 (4.7118)  time: 0.7225  data: 0.0002  max mem: 30335
[14:59:23.989740] Epoch: [3]  [4000/6672]  eta: 0:33:23  lr: 0.000972  loss: 4.4383 (4.6655)  time: 0.7229  data: 0.0003  max mem: 30335
[15:24:08.101614] Epoch: [3]  [6000/6672]  eta: 0:08:22  lr: 0.001053  loss: 4.3810 (4.6160)  time: 0.7220  data: 0.0003  max mem: 30335
[15:32:20.428650] Epoch: [3]  [6671/6672]  eta: 0:00:00  lr: 0.001080  loss: 4.4028 (4.6020)  time: 0.7183  data: 0.0006  max mem: 30335
[15:32:21.272429] Epoch: [3] Total time: 1:22:58 (0.7461 s / it)
[15:32:21.330568] Averaged stats: lr: 0.001080  loss: 4.4028 (4.5992)
[15:32:26.088326] Test:  [   0/2084]  eta: 2:45:05  loss: 2.3057 (2.3057)  acc1: 37.5000 (37.5000)  acc5: 83.3333 (83.3333)  time: 4.7529  data: 4.0995  max mem: 30335
[15:34:34.040404] Test:  [ 500/2084]  eta: 0:06:59  loss: 2.4658 (2.4011)  acc1: 29.1667 (42.3902)  acc5: 75.0000 (72.3636)  time: 0.2546  data: 0.0002  max mem: 30335
[15:36:41.518898] Test:  [1000/2084]  eta: 0:04:41  loss: 2.8595 (2.4360)  acc1: 33.3333 (42.4992)  acc5: 58.3333 (71.5201)  time: 0.2548  data: 0.0002  max mem: 30335
[15:38:50.189436] Test:  [1500/2084]  eta: 0:02:31  loss: 2.8243 (2.6486)  acc1: 25.0000 (39.4626)  acc5: 66.6667 (67.7104)  time: 0.2548  data: 0.0002  max mem: 30335
[15:40:58.071600] Test:  [2000/2084]  eta: 0:00:21  loss: 2.3046 (2.7501)  acc1: 50.0000 (38.1393)  acc5: 75.0000 (65.9212)  time: 0.2545  data: 0.0002  max mem: 30335
[15:41:19.105299] Test:  [2083/2084]  eta: 0:00:00  loss: 1.4561 (2.7416)  acc1: 50.0000 (38.4120)  acc5: 83.3333 (66.1280)  time: 0.2475  data: 0.0001  max mem: 30335
[15:41:19.244255] Test: Total time: 0:08:57 (0.2581 s / it)
[15:41:33.519003] * Acc@1 38.406 Acc@5 66.121 loss 2.742
[15:41:33.519289] Accuracy of the network on the 50000 test images: 38.4%
[15:41:33.519344] Max accuracy: 38.41%
[15:41:33.654805] log_dir: ./output_dir_qkformer
[15:41:44.343756] Epoch: [4]  [   0/6672]  eta: 19:45:24  lr: 0.001080  loss: 4.6564 (4.6564)  time: 10.6602  data: 2.5194  max mem: 30335
[16:06:29.915240] Epoch: [4]  [2000/6672]  eta: 0:58:13  lr: 0.001161  loss: 4.3630 (4.3871)  time: 0.9453  data: 0.0062  max mem: 30335
[16:31:13.191692] Epoch: [4]  [4000/6672]  eta: 0:33:09  lr: 0.001242  loss: 4.1408 (4.3536)  time: 0.7181  data: 0.0002  max mem: 30335
[16:56:05.077342] Epoch: [4]  [6000/6672]  eta: 0:08:20  lr: 0.001323  loss: 4.0103 (4.3221)  time: 0.7195  data: 0.0002  max mem: 30335
[17:04:30.893671] Epoch: [4]  [6671/6672]  eta: 0:00:00  lr: 0.001350  loss: 4.2901 (4.3136)  time: 0.7174  data: 0.0011  max mem: 30335
[17:04:31.812318] Epoch: [4] Total time: 1:22:58 (0.7461 s / it)
[17:04:31.863748] Averaged stats: lr: 0.001350  loss: 4.2901 (4.3090)
[17:04:35.903133] Test:  [   0/2084]  eta: 2:20:06  loss: 1.3742 (1.3742)  acc1: 83.3333 (83.3333)  acc5: 87.5000 (87.5000)  time: 4.0339  data: 3.3802  max mem: 30335
[17:06:43.483685] Test:  [ 500/2084]  eta: 0:06:56  loss: 1.9791 (2.0358)  acc1: 50.0000 (49.1683)  acc5: 83.3333 (78.7592)  time: 0.2559  data: 0.0002  max mem: 30335
[17:08:51.519321] Test:  [1000/2084]  eta: 0:04:41  loss: 2.6220 (2.0961)  acc1: 41.6667 (48.8303)  acc5: 70.8333 (77.7514)  time: 0.2556  data: 0.0002  max mem: 30335
[17:10:59.190014] Test:  [1500/2084]  eta: 0:02:30  loss: 2.2883 (2.3077)  acc1: 45.8333 (45.6612)  acc5: 66.6667 (74.2172)  time: 0.2547  data: 0.0002  max mem: 30335
[17:13:07.277889] Test:  [2000/2084]  eta: 0:00:21  loss: 1.7278 (2.4206)  acc1: 62.5000 (44.0238)  acc5: 83.3333 (72.1306)  time: 0.2550  data: 0.0002  max mem: 30335
[17:13:28.480771] Test:  [2083/2084]  eta: 0:00:00  loss: 1.1071 (2.4132)  acc1: 70.8333 (44.2460)  acc5: 91.6667 (72.2540)  time: 0.2479  data: 0.0002  max mem: 30335
[17:13:28.616402] Test: Total time: 0:08:56 (0.2576 s / it)
[17:13:43.018624] * Acc@1 44.258 Acc@5 72.260 loss 2.413
[17:13:43.019004] Accuracy of the network on the 50000 test images: 44.3%
[17:13:43.019045] Max accuracy: 44.26%
[17:13:43.153918] log_dir: ./output_dir_qkformer
[17:13:48.792851] Epoch: [5]  [   0/6672]  eta: 10:26:48  lr: 0.001350  loss: 4.4154 (4.4154)  time: 5.6368  data: 2.4947  max mem: 30335
[17:38:41.026305] Epoch: [5]  [2000/6672]  eta: 0:58:17  lr: 0.001350  loss: 3.9938 (4.1480)  time: 0.7218  data: 0.0002  max mem: 30335
[18:03:43.940760] Epoch: [5]  [4000/6672]  eta: 0:33:23  lr: 0.001350  loss: 3.7663 (4.1140)  time: 0.7395  data: 0.0003  max mem: 30335
[18:28:33.618237] Epoch: [5]  [6000/6672]  eta: 0:08:22  lr: 0.001350  loss: 4.0841 (4.0833)  time: 0.7230  data: 0.0003  max mem: 30335
[18:36:46.320374] Epoch: [5]  [6671/6672]  eta: 0:00:00  lr: 0.001350  loss: 3.7974 (4.0692)  time: 0.7187  data: 0.0006  max mem: 30335
[18:36:47.275079] Epoch: [5] Total time: 1:23:04 (0.7470 s / it)
[18:36:47.314960] Averaged stats: lr: 0.001350  loss: 3.7974 (4.0696)
[18:36:52.208681] Test:  [   0/2084]  eta: 2:49:43  loss: 0.7388 (0.7388)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.8865  data: 4.0098  max mem: 30335
[18:39:00.198451] Test:  [ 500/2084]  eta: 0:07:00  loss: 2.1102 (1.7514)  acc1: 45.8333 (55.7967)  acc5: 83.3333 (83.6743)  time: 0.2554  data: 0.0002  max mem: 30335
[18:41:08.999814] Test:  [1000/2084]  eta: 0:04:43  loss: 2.2216 (1.8284)  acc1: 45.8333 (54.8327)  acc5: 70.8333 (82.3052)  time: 0.2558  data: 0.0002  max mem: 30335
[18:43:17.074735] Test:  [1500/2084]  eta: 0:02:31  loss: 2.1371 (2.0244)  acc1: 45.8333 (51.7155)  acc5: 75.0000 (78.9973)  time: 0.2541  data: 0.0002  max mem: 30335
[18:45:25.420258] Test:  [2000/2084]  eta: 0:00:21  loss: 1.4023 (2.1132)  acc1: 66.6667 (50.5705)  acc5: 87.5000 (77.4155)  time: 0.2550  data: 0.0002  max mem: 30335
[18:45:46.443858] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8963 (2.1057)  acc1: 79.1667 (50.7300)  acc5: 91.6667 (77.4940)  time: 0.2471  data: 0.0001  max mem: 30335
[18:45:46.556943] Test: Total time: 0:08:59 (0.2588 s / it)
[18:46:01.702124] * Acc@1 50.719 Acc@5 77.480 loss 2.106
[18:46:01.702490] Accuracy of the network on the 50000 test images: 50.7%
[18:46:01.702527] Max accuracy: 50.72%
[18:46:02.103828] log_dir: ./output_dir_qkformer
[18:46:10.833769] Epoch: [6]  [   0/6672]  eta: 15:57:18  lr: 0.001350  loss: 3.9069 (3.9069)  time: 8.6088  data: 3.8903  max mem: 30335
[19:10:57.879398] Epoch: [6]  [2000/6672]  eta: 0:58:11  lr: 0.001350  loss: 3.8468 (3.9185)  time: 0.8412  data: 0.0003  max mem: 30335
[19:36:00.878374] Epoch: [6]  [4000/6672]  eta: 0:33:22  lr: 0.001350  loss: 3.7996 (3.9019)  time: 0.7189  data: 0.0003  max mem: 30335
[20:00:48.443557] Epoch: [6]  [6000/6672]  eta: 0:08:22  lr: 0.001350  loss: 3.8383 (3.8814)  time: 0.8604  data: 0.0002  max mem: 30335
[20:09:10.162697] Epoch: [6]  [6671/6672]  eta: 0:00:00  lr: 0.001350  loss: 3.8113 (3.8745)  time: 0.7211  data: 0.0006  max mem: 30335
[20:09:10.943961] Epoch: [6] Total time: 1:23:08 (0.7477 s / it)
[20:09:10.994579] Averaged stats: lr: 0.001350  loss: 3.8113 (3.8706)
[20:09:15.589352] Test:  [   0/2084]  eta: 2:39:25  loss: 0.6345 (0.6345)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.5900  data: 3.5254  max mem: 30335
[20:11:24.507139] Test:  [ 500/2084]  eta: 0:07:02  loss: 1.8814 (1.6784)  acc1: 41.6667 (58.2502)  acc5: 83.3333 (84.2565)  time: 0.2550  data: 0.0002  max mem: 30335
[20:13:33.816205] Test:  [1000/2084]  eta: 0:04:44  loss: 2.1082 (1.7175)  acc1: 54.1667 (57.7090)  acc5: 79.1667 (83.9452)  time: 0.2556  data: 0.0002  max mem: 30335
[20:15:41.504891] Test:  [1500/2084]  eta: 0:02:31  loss: 1.9834 (1.8964)  acc1: 50.0000 (54.6636)  acc5: 83.3333 (80.9655)  time: 0.2557  data: 0.0002  max mem: 30335
[20:17:50.830416] Test:  [2000/2084]  eta: 0:00:21  loss: 0.9856 (1.9658)  acc1: 79.1667 (53.6898)  acc5: 91.6667 (79.7622)  time: 0.2557  data: 0.0002  max mem: 30335
[20:18:11.850033] Test:  [2083/2084]  eta: 0:00:00  loss: 1.0131 (1.9656)  acc1: 75.0000 (53.6720)  acc5: 87.5000 (79.7860)  time: 0.2477  data: 0.0002  max mem: 30335
[20:18:11.980212] Test: Total time: 0:09:00 (0.2596 s / it)
[20:18:26.375462] * Acc@1 53.689 Acc@5 79.793 loss 1.965
[20:18:26.375718] Accuracy of the network on the 50000 test images: 53.7%
[20:18:26.375749] Max accuracy: 53.69%
[20:18:26.503087] log_dir: ./output_dir_qkformer
[20:18:36.556707] Epoch: [7]  [   0/6672]  eta: 18:18:40  lr: 0.001350  loss: 3.8394 (3.8394)  time: 9.8802  data: 2.3447  max mem: 30335
[20:43:37.191568] Epoch: [7]  [2000/6672]  eta: 0:58:45  lr: 0.001350  loss: 3.6471 (3.7472)  time: 0.7250  data: 0.0002  max mem: 30335
[21:08:32.299839] Epoch: [7]  [4000/6672]  eta: 0:33:26  lr: 0.001349  loss: 3.5796 (3.7379)  time: 0.9110  data: 0.0002  max mem: 30335
[21:34:00.486107] Epoch: [7]  [6000/6672]  eta: 0:08:27  lr: 0.001349  loss: 3.4110 (3.7259)  time: 0.7278  data: 0.0002  max mem: 30335
[21:42:21.782537] Epoch: [7]  [6671/6672]  eta: 0:00:00  lr: 0.001349  loss: 3.6308 (3.7183)  time: 0.7189  data: 0.0006  max mem: 30335
[21:42:22.513142] Epoch: [7] Total time: 1:23:56 (0.7548 s / it)
[21:42:22.550126] Averaged stats: lr: 0.001349  loss: 3.6308 (3.7244)
[21:42:26.523327] Test:  [   0/2084]  eta: 2:17:51  loss: 0.9345 (0.9345)  acc1: 75.0000 (75.0000)  acc5: 91.6667 (91.6667)  time: 3.9689  data: 3.3573  max mem: 30335
[21:44:34.739970] Test:  [ 500/2084]  eta: 0:06:57  loss: 1.1729 (1.4498)  acc1: 66.6667 (64.0469)  acc5: 91.6667 (87.7162)  time: 0.2544  data: 0.0002  max mem: 30335
[21:46:43.416493] Test:  [1000/2084]  eta: 0:04:42  loss: 1.9137 (1.5237)  acc1: 54.1667 (61.9755)  acc5: 83.3333 (86.8423)  time: 0.2550  data: 0.0002  max mem: 30335
[21:48:51.135267] Test:  [1500/2084]  eta: 0:02:31  loss: 1.8816 (1.7016)  acc1: 54.1667 (58.8413)  acc5: 83.3333 (83.8330)  time: 0.2552  data: 0.0002  max mem: 30335
[21:50:59.574003] Test:  [2000/2084]  eta: 0:00:21  loss: 1.0919 (1.7869)  acc1: 75.0000 (57.3005)  acc5: 91.6667 (82.3672)  time: 0.2549  data: 0.0002  max mem: 30335
[21:51:20.589838] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8089 (1.7812)  acc1: 83.3333 (57.5220)  acc5: 95.8333 (82.4500)  time: 0.2471  data: 0.0002  max mem: 30335
[21:51:20.707142] Test: Total time: 0:08:58 (0.2582 s / it)
[21:51:35.001859] * Acc@1 57.552 Acc@5 82.441 loss 1.781
[21:51:35.002193] Accuracy of the network on the 50000 test images: 57.6%
[21:51:35.002225] Max accuracy: 57.55%
[21:51:35.073744] log_dir: ./output_dir_qkformer
[21:51:46.913500] Epoch: [8]  [   0/6672]  eta: 21:47:50  lr: 0.001349  loss: 3.6231 (3.6231)  time: 11.7612  data: 5.8220  max mem: 30335
[22:16:46.119915] Epoch: [8]  [2000/6672]  eta: 0:58:46  lr: 0.001349  loss: 3.4660 (3.6432)  time: 0.7264  data: 0.0002  max mem: 30335
[22:41:40.423926] Epoch: [8]  [4000/6672]  eta: 0:33:26  lr: 0.001349  loss: 3.7241 (3.6315)  time: 0.7192  data: 0.0002  max mem: 30335
[23:06:39.403571] Epoch: [8]  [6000/6672]  eta: 0:08:24  lr: 0.001349  loss: 3.7079 (3.6227)  time: 0.7258  data: 0.0002  max mem: 30335
[23:14:58.488627] Epoch: [8]  [6671/6672]  eta: 0:00:00  lr: 0.001349  loss: 3.8064 (3.6164)  time: 0.7182  data: 0.0006  max mem: 30335
[23:14:59.454077] Epoch: [8] Total time: 1:23:24 (0.7501 s / it)
[23:14:59.501459] Averaged stats: lr: 0.001349  loss: 3.8064 (3.6128)
[23:15:03.904560] Test:  [   0/2084]  eta: 2:32:42  loss: 0.7368 (0.7368)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.3964  data: 3.5696  max mem: 30335
[23:17:13.411470] Test:  [ 500/2084]  eta: 0:07:03  loss: 1.4056 (1.3612)  acc1: 58.3333 (65.7019)  acc5: 91.6667 (89.2715)  time: 0.2553  data: 0.0002  max mem: 30335
[23:19:22.079749] Test:  [1000/2084]  eta: 0:04:44  loss: 1.6846 (1.4363)  acc1: 62.5000 (63.8237)  acc5: 79.1667 (88.1410)  time: 0.2546  data: 0.0002  max mem: 30335
[23:21:29.956010] Test:  [1500/2084]  eta: 0:02:31  loss: 1.7237 (1.5970)  acc1: 54.1667 (60.9621)  acc5: 83.3333 (85.3126)  time: 0.2545  data: 0.0002  max mem: 30335
[23:23:37.347294] Test:  [2000/2084]  eta: 0:00:21  loss: 1.1089 (1.6849)  acc1: 70.8333 (59.3786)  acc5: 87.5000 (83.7977)  time: 0.2542  data: 0.0002  max mem: 30335
[23:23:58.386534] Test:  [2083/2084]  eta: 0:00:00  loss: 0.9126 (1.6829)  acc1: 79.1667 (59.4460)  acc5: 95.8333 (83.8200)  time: 0.2466  data: 0.0002  max mem: 30335
[23:23:58.512341] Test: Total time: 0:08:59 (0.2586 s / it)
[23:24:12.365933] * Acc@1 59.473 Acc@5 83.806 loss 1.683
[23:24:12.366238] Accuracy of the network on the 50000 test images: 59.5%
[23:24:12.366278] Max accuracy: 59.47%
[23:24:12.552773] log_dir: ./output_dir_qkformer
[23:24:22.666327] Epoch: [9]  [   0/6672]  eta: 18:44:04  lr: 0.001349  loss: 3.9317 (3.9317)  time: 10.1086  data: 2.8464  max mem: 30335
[23:49:11.990682] Epoch: [9]  [2000/6672]  eta: 0:58:20  lr: 0.001348  loss: 3.5936 (3.5431)  time: 0.7218  data: 0.0002  max mem: 30335
[00:14:11.865991] Epoch: [9]  [4000/6672]  eta: 0:33:22  lr: 0.001348  loss: 3.4924 (3.5419)  time: 0.7212  data: 0.0002  max mem: 30335
[00:39:11.689425] Epoch: [9]  [6000/6672]  eta: 0:08:23  lr: 0.001348  loss: 3.3687 (3.5336)  time: 0.7639  data: 0.0003  max mem: 30335
[00:47:32.249461] Epoch: [9]  [6671/6672]  eta: 0:00:00  lr: 0.001348  loss: 3.7239 (3.5319)  time: 0.7184  data: 0.0006  max mem: 30335
[00:47:33.151597] Epoch: [9] Total time: 1:23:20 (0.7495 s / it)
[00:47:33.180838] Averaged stats: lr: 0.001348  loss: 3.7239 (3.5266)
[00:47:37.034231] Test:  [   0/2084]  eta: 2:13:38  loss: 0.9330 (0.9330)  acc1: 79.1667 (79.1667)  acc5: 95.8333 (95.8333)  time: 3.8478  data: 3.2239  max mem: 30335
[00:49:45.487241] Test:  [ 500/2084]  eta: 0:06:58  loss: 1.1383 (1.3474)  acc1: 66.6667 (66.3839)  acc5: 91.6667 (89.4461)  time: 0.2555  data: 0.0002  max mem: 30335
[00:51:53.235071] Test:  [1000/2084]  eta: 0:04:41  loss: 1.5663 (1.4027)  acc1: 62.5000 (64.8102)  acc5: 87.5000 (88.7155)  time: 0.2546  data: 0.0002  max mem: 30335
[00:54:01.288558] Test:  [1500/2084]  eta: 0:02:30  loss: 1.5795 (1.5870)  acc1: 58.3333 (61.5451)  acc5: 87.5000 (85.6429)  time: 0.2557  data: 0.0002  max mem: 30335
[00:56:09.200440] Test:  [2000/2084]  eta: 0:00:21  loss: 0.8154 (1.6686)  acc1: 83.3333 (60.1720)  acc5: 95.8333 (84.1954)  time: 0.2553  data: 0.0002  max mem: 30335
[00:56:30.508542] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8389 (1.6677)  acc1: 83.3333 (60.1980)  acc5: 95.8333 (84.2140)  time: 0.2472  data: 0.0002  max mem: 30335
[00:56:30.633209] Test: Total time: 0:08:57 (0.2579 s / it)
[00:56:44.515753] * Acc@1 60.196 Acc@5 84.221 loss 1.668
[00:56:44.516056] Accuracy of the network on the 50000 test images: 60.2%
[00:56:44.516089] Max accuracy: 60.20%
[00:56:44.759568] log_dir: ./output_dir_qkformer
[00:56:52.012527] Epoch: [10]  [   0/6672]  eta: 13:26:23  lr: 0.001348  loss: 3.2994 (3.2994)  time: 7.2518  data: 3.1566  max mem: 30335
[01:22:09.349927] Epoch: [10]  [2000/6672]  eta: 0:59:19  lr: 0.001348  loss: 3.3922 (3.4794)  time: 0.9753  data: 0.0003  max mem: 30335
[01:47:05.186718] Epoch: [10]  [4000/6672]  eta: 0:33:36  lr: 0.001347  loss: 3.4117 (3.4720)  time: 0.7230  data: 0.0002  max mem: 30335
[02:11:44.235237] Epoch: [10]  [6000/6672]  eta: 0:08:23  lr: 0.001347  loss: 3.2250 (3.4640)  time: 0.7268  data: 0.0002  max mem: 30335
[02:20:02.761821] Epoch: [10]  [6671/6672]  eta: 0:00:00  lr: 0.001347  loss: 3.4759 (3.4602)  time: 0.7198  data: 0.0006  max mem: 30335
[02:20:03.553157] Epoch: [10] Total time: 1:23:18 (0.7492 s / it)
[02:20:03.558665] Averaged stats: lr: 0.001347  loss: 3.4759 (3.4569)
[02:20:07.853631] Test:  [   0/2084]  eta: 2:28:58  loss: 0.7565 (0.7565)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.2892  data: 3.6652  max mem: 30335
[02:22:16.199267] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.2360 (1.3191)  acc1: 62.5000 (66.9494)  acc5: 95.8333 (90.2528)  time: 0.2554  data: 0.0002  max mem: 30335
[02:24:24.493871] Test:  [1000/2084]  eta: 0:04:42  loss: 1.6920 (1.3898)  acc1: 58.3333 (65.3222)  acc5: 83.3333 (89.4356)  time: 0.2563  data: 0.0002  max mem: 30335
[02:26:32.985502] Test:  [1500/2084]  eta: 0:02:31  loss: 1.6359 (1.5522)  acc1: 58.3333 (62.2391)  acc5: 83.3333 (86.6367)  time: 0.2552  data: 0.0002  max mem: 30335
[02:28:41.054316] Test:  [2000/2084]  eta: 0:00:21  loss: 0.9872 (1.6371)  acc1: 79.1667 (60.8196)  acc5: 91.6667 (85.1033)  time: 0.2566  data: 0.0002  max mem: 30335
[02:29:02.127737] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8847 (1.6284)  acc1: 83.3333 (61.0320)  acc5: 91.6667 (85.2340)  time: 0.2477  data: 0.0001  max mem: 30335
[02:29:02.264059] Test: Total time: 0:08:58 (0.2585 s / it)
[02:29:17.162942] * Acc@1 61.038 Acc@5 85.224 loss 1.628
[02:29:17.163355] Accuracy of the network on the 50000 test images: 61.0%
[02:29:17.163421] Max accuracy: 61.04%
[02:29:17.292704] log_dir: ./output_dir_qkformer
[02:29:20.938521] Epoch: [11]  [   0/6672]  eta: 6:45:16  lr: 0.001347  loss: 3.2298 (3.2298)  time: 3.6446  data: 2.2184  max mem: 30335
[02:54:27.754307] Epoch: [11]  [2000/6672]  eta: 0:58:46  lr: 0.001347  loss: 3.3894 (3.3929)  time: 0.7239  data: 0.0002  max mem: 30335
[03:19:11.517877] Epoch: [11]  [4000/6672]  eta: 0:33:19  lr: 0.001346  loss: 3.3516 (3.4016)  time: 0.7240  data: 0.0002  max mem: 30335
[03:44:01.714096] Epoch: [11]  [6000/6672]  eta: 0:08:22  lr: 0.001346  loss: 3.2695 (3.4009)  time: 0.7483  data: 0.0003  max mem: 30335
[03:52:13.786777] Epoch: [11]  [6671/6672]  eta: 0:00:00  lr: 0.001346  loss: 3.2747 (3.3998)  time: 0.7227  data: 0.0006  max mem: 30335
[03:52:14.572278] Epoch: [11] Total time: 1:22:57 (0.7460 s / it)
[03:52:14.579209] Averaged stats: lr: 0.001346  loss: 3.2747 (3.3982)
[03:52:16.926826] Test:  [   0/2084]  eta: 1:21:21  loss: 0.8306 (0.8306)  acc1: 83.3333 (83.3333)  acc5: 91.6667 (91.6667)  time: 2.3425  data: 1.8426  max mem: 30335
[03:54:25.212537] Test:  [ 500/2084]  eta: 0:06:52  loss: 1.2379 (1.3232)  acc1: 62.5000 (66.6584)  acc5: 95.8333 (90.2362)  time: 0.2557  data: 0.0002  max mem: 30335
[03:56:33.604976] Test:  [1000/2084]  eta: 0:04:40  loss: 1.3313 (1.3594)  acc1: 75.0000 (65.8841)  acc5: 87.5000 (89.6520)  time: 0.2563  data: 0.0002  max mem: 30335
[03:58:41.872846] Test:  [1500/2084]  eta: 0:02:30  loss: 1.5068 (1.4972)  acc1: 62.5000 (63.2495)  acc5: 83.3333 (87.2363)  time: 0.2559  data: 0.0002  max mem: 30335
[04:00:49.976958] Test:  [2000/2084]  eta: 0:00:21  loss: 0.8816 (1.5665)  acc1: 79.1667 (62.0835)  acc5: 91.6667 (85.9820)  time: 0.2627  data: 0.0002  max mem: 30335
[04:01:11.020540] Test:  [2083/2084]  eta: 0:00:00  loss: 1.0623 (1.5645)  acc1: 83.3333 (62.1840)  acc5: 91.6667 (85.9760)  time: 0.2479  data: 0.0002  max mem: 30335
[04:01:11.152455] Test: Total time: 0:08:56 (0.2575 s / it)
[04:01:25.665014] * Acc@1 62.212 Acc@5 85.964 loss 1.565
[04:01:25.665262] Accuracy of the network on the 50000 test images: 62.2%
[04:01:25.665294] Max accuracy: 62.21%
[04:01:25.955937] log_dir: ./output_dir_qkformer
[04:01:29.262423] Epoch: [12]  [   0/6672]  eta: 6:07:34  lr: 0.001346  loss: 4.0088 (4.0088)  time: 3.3055  data: 1.8673  max mem: 30335
[04:26:22.986634] Epoch: [12]  [2000/6672]  eta: 0:58:14  lr: 0.001345  loss: 3.3786 (3.3459)  time: 0.7222  data: 0.0002  max mem: 30335
[04:51:05.610310] Epoch: [12]  [4000/6672]  eta: 0:33:09  lr: 0.001345  loss: 3.3728 (3.3432)  time: 0.7246  data: 0.0002  max mem: 30335
[05:16:02.270742] Epoch: [12]  [6000/6672]  eta: 0:08:21  lr: 0.001345  loss: 3.5090 (3.3450)  time: 0.7239  data: 0.0002  max mem: 30335
[05:24:24.359153] Epoch: [12]  [6671/6672]  eta: 0:00:00  lr: 0.001344  loss: 3.4276 (3.3463)  time: 0.7221  data: 0.0007  max mem: 30335
[05:24:25.227767] Epoch: [12] Total time: 1:22:59 (0.7463 s / it)
[05:24:25.345982] Averaged stats: lr: 0.001344  loss: 3.4276 (3.3481)
[05:24:29.482388] Test:  [   0/2084]  eta: 2:23:31  loss: 0.5722 (0.5722)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.1320  data: 3.5325  max mem: 30335
[05:26:37.492252] Test:  [ 500/2084]  eta: 0:06:57  loss: 1.5171 (1.2232)  acc1: 62.5000 (69.3197)  acc5: 87.5000 (91.4504)  time: 0.2566  data: 0.0002  max mem: 30335
[05:28:45.804851] Test:  [1000/2084]  eta: 0:04:42  loss: 1.8026 (1.2839)  acc1: 62.5000 (68.1943)  acc5: 79.1667 (90.5803)  time: 0.2613  data: 0.0002  max mem: 30335
[05:30:54.223835] Test:  [1500/2084]  eta: 0:02:31  loss: 1.2822 (1.4315)  acc1: 70.8333 (65.1843)  acc5: 87.5000 (88.2995)  time: 0.2562  data: 0.0002  max mem: 30335
[05:33:02.669050] Test:  [2000/2084]  eta: 0:00:21  loss: 0.9028 (1.5110)  acc1: 75.0000 (63.7390)  acc5: 91.6667 (86.9170)  time: 0.2561  data: 0.0002  max mem: 30335
[05:33:23.826131] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7881 (1.5071)  acc1: 87.5000 (63.8520)  acc5: 95.8333 (87.0200)  time: 0.2505  data: 0.0001  max mem: 30335
[05:33:23.938192] Test: Total time: 0:08:58 (0.2584 s / it)
[05:33:38.852644] * Acc@1 63.817 Acc@5 87.017 loss 1.507
[05:33:38.853062] Accuracy of the network on the 50000 test images: 63.8%
[05:33:38.853096] Max accuracy: 63.82%
[05:33:38.987052] log_dir: ./output_dir_qkformer
[05:33:44.743355] Epoch: [13]  [   0/6672]  eta: 10:34:59  lr: 0.001344  loss: 3.6700 (3.6700)  time: 5.7103  data: 2.1028  max mem: 30335
[05:58:36.534031] Epoch: [13]  [2000/6672]  eta: 0:58:15  lr: 0.001344  loss: 3.2303 (3.2970)  time: 0.7577  data: 0.0002  max mem: 30335
[06:23:39.254898] Epoch: [13]  [4000/6672]  eta: 0:33:23  lr: 0.001344  loss: 3.4613 (3.3004)  time: 0.7237  data: 0.0002  max mem: 30335
[06:48:23.830045] Epoch: [13]  [6000/6672]  eta: 0:08:22  lr: 0.001343  loss: 3.2895 (3.3049)  time: 0.7244  data: 0.0003  max mem: 30335
[06:56:39.921494] Epoch: [13]  [6671/6672]  eta: 0:00:00  lr: 0.001343  loss: 3.2330 (3.3044)  time: 0.7216  data: 0.0007  max mem: 30335
[06:56:40.752399] Epoch: [13] Total time: 1:23:01 (0.7467 s / it)
[06:56:40.775205] Averaged stats: lr: 0.001343  loss: 3.2330 (3.3055)
[06:56:44.849819] Test:  [   0/2084]  eta: 2:21:21  loss: 0.3295 (0.3295)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.0701  data: 3.5137  max mem: 30335
[06:58:52.920092] Test:  [ 500/2084]  eta: 0:06:57  loss: 1.3010 (1.2000)  acc1: 62.5000 (69.7023)  acc5: 95.8333 (91.7665)  time: 0.2558  data: 0.0002  max mem: 30335
[07:01:00.954570] Test:  [1000/2084]  eta: 0:04:41  loss: 1.3676 (1.2928)  acc1: 66.6667 (67.8114)  acc5: 83.3333 (90.3596)  time: 0.2563  data: 0.0002  max mem: 30335
[07:03:08.987314] Test:  [1500/2084]  eta: 0:02:31  loss: 1.6463 (1.4278)  acc1: 58.3333 (65.1621)  acc5: 87.5000 (88.2967)  time: 0.2556  data: 0.0002  max mem: 30335
[07:05:17.218868] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7729 (1.4977)  acc1: 83.3333 (63.7681)  acc5: 95.8333 (87.0232)  time: 0.2557  data: 0.0002  max mem: 30335
[07:05:38.277180] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8471 (1.4971)  acc1: 79.1667 (63.8160)  acc5: 95.8333 (87.0280)  time: 0.2473  data: 0.0001  max mem: 30335
[07:05:38.394724] Test: Total time: 0:08:57 (0.2580 s / it)
[07:05:53.392947] * Acc@1 63.794 Acc@5 87.046 loss 1.497
[07:05:53.393227] Accuracy of the network on the 50000 test images: 63.8%
[07:05:53.393263] Max accuracy: 63.82%
[07:05:53.471904] log_dir: ./output_dir_qkformer
[07:06:04.096735] Epoch: [14]  [   0/6672]  eta: 19:35:04  lr: 0.001343  loss: 2.6501 (2.6501)  time: 10.5672  data: 3.0761  max mem: 30335
[07:30:44.082286] Epoch: [14]  [2000/6672]  eta: 0:57:59  lr: 0.001342  loss: 3.2202 (3.2746)  time: 0.7323  data: 0.0003  max mem: 30335
[07:55:30.846609] Epoch: [14]  [4000/6672]  eta: 0:33:07  lr: 0.001342  loss: 3.3435 (3.2729)  time: 0.7752  data: 0.0003  max mem: 30335
[08:20:16.236340] Epoch: [14]  [6000/6672]  eta: 0:08:19  lr: 0.001341  loss: 3.0553 (3.2718)  time: 0.8007  data: 0.0003  max mem: 30335
[08:28:39.647042] Epoch: [14]  [6671/6672]  eta: 0:00:00  lr: 0.001341  loss: 3.3054 (3.2739)  time: 0.7214  data: 0.0006  max mem: 30335
[08:28:40.465660] Epoch: [14] Total time: 1:22:46 (0.7445 s / it)
[08:28:40.514057] Averaged stats: lr: 0.001341  loss: 3.3054 (3.2718)
[08:28:44.311383] Test:  [   0/2084]  eta: 2:11:43  loss: 0.5341 (0.5341)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 3.7922  data: 3.3394  max mem: 30335
[08:30:52.894077] Test:  [ 500/2084]  eta: 0:06:58  loss: 1.0178 (1.1411)  acc1: 70.8333 (70.2179)  acc5: 95.8333 (91.9827)  time: 0.2565  data: 0.0002  max mem: 30335
[08:33:02.341856] Test:  [1000/2084]  eta: 0:04:43  loss: 1.3632 (1.1899)  acc1: 66.6667 (69.5638)  acc5: 87.5000 (91.4003)  time: 0.2553  data: 0.0002  max mem: 30335
[08:35:10.597313] Test:  [1500/2084]  eta: 0:02:31  loss: 1.3179 (1.3453)  acc1: 70.8333 (66.7416)  acc5: 91.6667 (89.0157)  time: 0.2563  data: 0.0002  max mem: 30335
[08:37:18.668034] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7169 (1.4256)  acc1: 83.3333 (65.2361)  acc5: 95.8333 (87.8436)  time: 0.2563  data: 0.0002  max mem: 30335
[08:37:39.767377] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5366 (1.4231)  acc1: 83.3333 (65.3460)  acc5: 95.8333 (87.8520)  time: 0.2484  data: 0.0002  max mem: 30335
[08:37:39.897048] Test: Total time: 0:08:59 (0.2588 s / it)
[08:37:54.309490] * Acc@1 65.341 Acc@5 87.844 loss 1.423
[08:37:54.309783] Accuracy of the network on the 50000 test images: 65.3%
[08:37:54.309826] Max accuracy: 65.34%
[08:37:54.640542] log_dir: ./output_dir_qkformer
[08:37:58.650963] Epoch: [15]  [   0/6672]  eta: 7:25:49  lr: 0.001341  loss: 3.2148 (3.2148)  time: 4.0093  data: 1.9759  max mem: 30335
[09:02:44.715289] Epoch: [15]  [2000/6672]  eta: 0:57:58  lr: 0.001341  loss: 3.2338 (3.2258)  time: 0.7652  data: 0.0003  max mem: 30335
[09:27:29.705701] Epoch: [15]  [4000/6672]  eta: 0:33:06  lr: 0.001340  loss: 3.1093 (3.2340)  time: 0.7229  data: 0.0003  max mem: 30335
[09:52:15.518395] Epoch: [15]  [6000/6672]  eta: 0:08:19  lr: 0.001340  loss: 3.0710 (3.2384)  time: 0.7268  data: 0.0003  max mem: 30335
[10:00:28.364989] Epoch: [15]  [6671/6672]  eta: 0:00:00  lr: 0.001339  loss: 3.2656 (3.2385)  time: 0.7206  data: 0.0006  max mem: 30335
[10:00:29.265544] Epoch: [15] Total time: 1:22:34 (0.7426 s / it)
[10:00:29.319988] Averaged stats: lr: 0.001339  loss: 3.2656 (3.2395)
[10:00:34.045735] Test:  [   0/2084]  eta: 2:43:57  loss: 0.4080 (0.4080)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.7205  data: 3.9006  max mem: 30335
[10:02:42.547718] Test:  [ 500/2084]  eta: 0:07:01  loss: 1.1031 (1.1434)  acc1: 66.6667 (71.3490)  acc5: 95.8333 (92.5067)  time: 0.2809  data: 0.0017  max mem: 30335
[10:04:51.655966] Test:  [1000/2084]  eta: 0:04:44  loss: 1.6887 (1.2013)  acc1: 58.3333 (70.4962)  acc5: 87.5000 (91.6500)  time: 0.2559  data: 0.0002  max mem: 30335
[10:06:59.725940] Test:  [1500/2084]  eta: 0:02:31  loss: 1.4006 (1.3480)  acc1: 62.5000 (67.6438)  acc5: 91.6667 (89.3932)  time: 0.2561  data: 0.0002  max mem: 30335
[10:09:07.810900] Test:  [2000/2084]  eta: 0:00:21  loss: 0.6806 (1.4261)  acc1: 87.5000 (66.1461)  acc5: 95.8333 (88.0955)  time: 0.2561  data: 0.0002  max mem: 30335
[10:09:28.959977] Test:  [2083/2084]  eta: 0:00:00  loss: 0.9394 (1.4272)  acc1: 79.1667 (66.1240)  acc5: 95.8333 (88.0700)  time: 0.2486  data: 0.0001  max mem: 30335
[10:09:29.089662] Test: Total time: 0:08:59 (0.2590 s / it)
[10:09:43.881921] * Acc@1 66.139 Acc@5 88.072 loss 1.427
[10:09:43.882328] Accuracy of the network on the 50000 test images: 66.1%
[10:09:43.882363] Max accuracy: 66.14%
[10:09:43.984923] log_dir: ./output_dir_qkformer
[10:09:48.201220] Epoch: [16]  [   0/6672]  eta: 7:40:37  lr: 0.001339  loss: 1.9408 (1.9408)  time: 4.1423  data: 2.2782  max mem: 30335
[10:34:43.092159] Epoch: [16]  [2000/6672]  eta: 0:58:19  lr: 0.001339  loss: 3.0099 (3.2141)  time: 0.7740  data: 0.0004  max mem: 30335
[10:59:38.558507] Epoch: [16]  [4000/6672]  eta: 0:33:19  lr: 0.001338  loss: 3.0734 (3.2192)  time: 0.7235  data: 0.0002  max mem: 30335
[11:24:34.042917] Epoch: [16]  [6000/6672]  eta: 0:08:22  lr: 0.001338  loss: 3.2691 (3.2171)  time: 0.7599  data: 0.0002  max mem: 30335
[11:33:01.247292] Epoch: [16]  [6671/6672]  eta: 0:00:00  lr: 0.001337  loss: 3.1453 (3.2150)  time: 0.7263  data: 0.0011  max mem: 30335
[11:33:02.007696] Epoch: [16] Total time: 1:23:18 (0.7491 s / it)
[11:33:02.023953] Averaged stats: lr: 0.001337  loss: 3.1453 (3.2134)
[11:33:05.584492] Test:  [   0/2084]  eta: 2:03:30  loss: 0.3485 (0.3485)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 3.5560  data: 2.9828  max mem: 30335
[11:35:13.853365] Test:  [ 500/2084]  eta: 0:06:56  loss: 1.1994 (1.1209)  acc1: 66.6667 (71.1826)  acc5: 91.6667 (92.4318)  time: 0.2563  data: 0.0002  max mem: 30335
[11:37:21.955220] Test:  [1000/2084]  eta: 0:04:41  loss: 1.1385 (1.1985)  acc1: 70.8333 (69.9217)  acc5: 87.5000 (91.5043)  time: 0.2565  data: 0.0002  max mem: 30335
[11:39:30.665706] Test:  [1500/2084]  eta: 0:02:31  loss: 1.2158 (1.3396)  acc1: 70.8333 (67.2135)  acc5: 87.5000 (89.2877)  time: 0.2565  data: 0.0002  max mem: 30335
[11:41:38.778110] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7647 (1.3993)  acc1: 83.3333 (65.9816)  acc5: 95.8333 (88.3412)  time: 0.2558  data: 0.0002  max mem: 30335
[11:41:59.957995] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7712 (1.3982)  acc1: 83.3333 (66.0240)  acc5: 95.8333 (88.3520)  time: 0.2538  data: 0.0001  max mem: 30335
[11:42:00.085829] Test: Total time: 0:08:58 (0.2582 s / it)
[11:42:14.555131] * Acc@1 66.012 Acc@5 88.356 loss 1.399
[11:42:14.555735] Accuracy of the network on the 50000 test images: 66.0%
[11:42:14.555786] Max accuracy: 66.14%
[11:42:14.655676] log_dir: ./output_dir_qkformer
[11:42:19.279192] Epoch: [17]  [   0/6672]  eta: 8:13:01  lr: 0.001337  loss: 3.3503 (3.3503)  time: 4.4336  data: 2.3367  max mem: 30335
[12:07:07.457144] Epoch: [17]  [2000/6672]  eta: 0:58:04  lr: 0.001337  loss: 3.2069 (3.1866)  time: 0.7264  data: 0.0003  max mem: 30335
[12:31:45.561351] Epoch: [17]  [4000/6672]  eta: 0:33:03  lr: 0.001336  loss: 3.2046 (3.1920)  time: 0.7227  data: 0.0003  max mem: 30335
[12:56:30.714229] Epoch: [17]  [6000/6672]  eta: 0:08:18  lr: 0.001335  loss: 3.1471 (3.1939)  time: 0.7255  data: 0.0003  max mem: 30335
[13:04:51.229284] Epoch: [17]  [6671/6672]  eta: 0:00:00  lr: 0.001335  loss: 3.2785 (3.1922)  time: 0.7221  data: 0.0007  max mem: 30335
[13:04:52.011923] Epoch: [17] Total time: 1:22:37 (0.7430 s / it)
[13:04:52.054845] Averaged stats: lr: 0.001335  loss: 3.2785 (3.1866)
[13:04:55.849244] Test:  [   0/2084]  eta: 2:11:36  loss: 0.5902 (0.5902)  acc1: 83.3333 (83.3333)  acc5: 95.8333 (95.8333)  time: 3.7891  data: 3.2096  max mem: 30335
[13:07:03.921341] Test:  [ 500/2084]  eta: 0:06:56  loss: 1.2278 (1.1563)  acc1: 62.5000 (71.5153)  acc5: 95.8333 (92.4152)  time: 0.2563  data: 0.0002  max mem: 30335
[13:09:12.876509] Test:  [1000/2084]  eta: 0:04:42  loss: 1.4925 (1.2156)  acc1: 66.6667 (70.4421)  acc5: 87.5000 (91.5751)  time: 0.2564  data: 0.0002  max mem: 30335
[13:11:22.652558] Test:  [1500/2084]  eta: 0:02:31  loss: 1.3259 (1.3451)  acc1: 66.6667 (67.9325)  acc5: 91.6667 (89.5070)  time: 0.2561  data: 0.0002  max mem: 30335
[13:13:30.794232] Test:  [2000/2084]  eta: 0:00:21  loss: 0.9173 (1.4147)  acc1: 83.3333 (66.6042)  acc5: 91.6667 (88.3746)  time: 0.2560  data: 0.0002  max mem: 30335
[13:13:51.882695] Test:  [2083/2084]  eta: 0:00:00  loss: 0.9059 (1.4132)  acc1: 83.3333 (66.6520)  acc5: 95.8333 (88.4020)  time: 0.2487  data: 0.0002  max mem: 30335
[13:13:52.011974] Test: Total time: 0:08:59 (0.2591 s / it)
[13:14:06.465721] * Acc@1 66.642 Acc@5 88.406 loss 1.413
[13:14:06.466007] Accuracy of the network on the 50000 test images: 66.6%
[13:14:06.466038] Max accuracy: 66.64%
[13:14:06.596783] log_dir: ./output_dir_qkformer
[13:14:12.961364] Epoch: [18]  [   0/6672]  eta: 11:44:27  lr: 0.001335  loss: 3.2553 (3.2553)  time: 6.3350  data: 2.3618  max mem: 30335
[13:38:53.081244] Epoch: [18]  [2000/6672]  eta: 0:57:49  lr: 0.001335  loss: 3.1797 (3.1644)  time: 0.7249  data: 0.0002  max mem: 30335
[14:03:53.364651] Epoch: [18]  [4000/6672]  eta: 0:33:14  lr: 0.001334  loss: 3.3366 (3.1591)  time: 0.7219  data: 0.0002  max mem: 30335
[14:28:52.681739] Epoch: [18]  [6000/6672]  eta: 0:08:22  lr: 0.001333  loss: 3.1233 (3.1645)  time: 0.7294  data: 0.0002  max mem: 30335
[14:37:22.716941] Epoch: [18]  [6671/6672]  eta: 0:00:00  lr: 0.001333  loss: 3.0899 (3.1641)  time: 0.7206  data: 0.0011  max mem: 30335
[14:37:23.493870] Epoch: [18] Total time: 1:23:16 (0.7489 s / it)
[14:37:23.520233] Averaged stats: lr: 0.001333  loss: 3.0899 (3.1661)
[14:37:27.324557] Test:  [   0/2084]  eta: 2:11:45  loss: 0.5832 (0.5832)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 3.7934  data: 3.2225  max mem: 30335
[14:39:35.231837] Test:  [ 500/2084]  eta: 0:06:56  loss: 1.1104 (1.1052)  acc1: 66.6667 (71.1244)  acc5: 91.6667 (92.0659)  time: 0.2561  data: 0.0002  max mem: 30335
[14:41:43.335978] Test:  [1000/2084]  eta: 0:04:41  loss: 1.1246 (1.1515)  acc1: 75.0000 (70.4337)  acc5: 87.5000 (91.5293)  time: 0.2556  data: 0.0002  max mem: 30335
[14:43:51.387725] Test:  [1500/2084]  eta: 0:02:30  loss: 1.4832 (1.3129)  acc1: 70.8333 (67.3690)  acc5: 87.5000 (89.1322)  time: 0.2544  data: 0.0002  max mem: 30335
[14:46:01.659620] Test:  [2000/2084]  eta: 0:00:21  loss: 0.8007 (1.3781)  acc1: 83.3333 (65.9983)  acc5: 91.6667 (88.1413)  time: 0.2787  data: 0.0002  max mem: 30335
[14:46:22.748567] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8483 (1.3742)  acc1: 83.3333 (66.0840)  acc5: 95.8333 (88.2080)  time: 0.2477  data: 0.0001  max mem: 30335
[14:46:22.849305] Test: Total time: 0:08:59 (0.2588 s / it)
[14:46:37.605938] * Acc@1 66.083 Acc@5 88.213 loss 1.374
[14:46:37.606228] Accuracy of the network on the 50000 test images: 66.1%
[14:46:37.606278] Max accuracy: 66.64%
[14:46:37.716301] log_dir: ./output_dir_qkformer
[14:46:45.058654] Epoch: [19]  [   0/6672]  eta: 13:28:46  lr: 0.001333  loss: 3.1742 (3.1742)  time: 7.2732  data: 3.1972  max mem: 30335
[15:11:43.389430] Epoch: [19]  [2000/6672]  eta: 0:58:34  lr: 0.001332  loss: 2.8966 (3.1284)  time: 0.7258  data: 0.0002  max mem: 30335
[15:36:42.748343] Epoch: [19]  [4000/6672]  eta: 0:33:26  lr: 0.001331  loss: 3.2011 (3.1349)  time: 1.0035  data: 0.0003  max mem: 30335
[16:01:41.094371] Epoch: [19]  [6000/6672]  eta: 0:08:24  lr: 0.001331  loss: 3.1956 (3.1363)  time: 0.7251  data: 0.0003  max mem: 30335
[16:09:58.262102] Epoch: [19]  [6671/6672]  eta: 0:00:00  lr: 0.001330  loss: 3.3227 (3.1382)  time: 0.7216  data: 0.0011  max mem: 30335
[16:09:59.272982] Epoch: [19] Total time: 1:23:21 (0.7496 s / it)
[16:09:59.314513] Averaged stats: lr: 0.001330  loss: 3.3227 (3.1437)
[16:10:03.639191] Test:  [   0/2084]  eta: 2:29:57  loss: 0.7604 (0.7604)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.3175  data: 3.7005  max mem: 30335
[16:12:12.303101] Test:  [ 500/2084]  eta: 0:07:00  loss: 1.2639 (1.1105)  acc1: 66.6667 (71.5902)  acc5: 95.8333 (92.3403)  time: 0.2561  data: 0.0002  max mem: 30335
[16:14:20.749628] Test:  [1000/2084]  eta: 0:04:43  loss: 1.2988 (1.1496)  acc1: 66.6667 (70.7584)  acc5: 87.5000 (91.8956)  time: 0.2557  data: 0.0002  max mem: 30335
[16:16:28.486740] Test:  [1500/2084]  eta: 0:02:31  loss: 1.3769 (1.2867)  acc1: 66.6667 (68.1851)  acc5: 91.6667 (89.8734)  time: 0.2550  data: 0.0002  max mem: 30335
[16:18:36.631034] Test:  [2000/2084]  eta: 0:00:21  loss: 0.9327 (1.3515)  acc1: 79.1667 (67.1289)  acc5: 91.6667 (88.8618)  time: 0.2559  data: 0.0002  max mem: 30335
[16:18:57.765717] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7014 (1.3503)  acc1: 83.3333 (67.1760)  acc5: 95.8333 (88.9040)  time: 0.2506  data: 0.0002  max mem: 30335
[16:18:57.889979] Test: Total time: 0:08:58 (0.2584 s / it)
[16:19:12.206507] * Acc@1 67.183 Acc@5 88.916 loss 1.350
[16:19:12.206782] Accuracy of the network on the 50000 test images: 67.2%
[16:19:12.206836] Max accuracy: 67.18%
[16:19:12.477119] log_dir: ./output_dir_qkformer
[16:19:15.849924] Epoch: [20]  [   0/6672]  eta: 6:13:07  lr: 0.001330  loss: 2.8888 (2.8888)  time: 3.3555  data: 2.0034  max mem: 30335
[16:44:05.531074] Epoch: [20]  [2000/6672]  eta: 0:58:05  lr: 0.001330  loss: 3.0827 (3.1212)  time: 0.7291  data: 0.0006  max mem: 30335
[17:08:53.379712] Epoch: [20]  [4000/6672]  eta: 0:33:10  lr: 0.001329  loss: 3.1164 (3.1194)  time: 0.7313  data: 0.0003  max mem: 30335
[17:33:43.747453] Epoch: [20]  [6000/6672]  eta: 0:08:20  lr: 0.001328  loss: 2.9276 (3.1269)  time: 0.7255  data: 0.0003  max mem: 30335
[17:42:04.844192] Epoch: [20]  [6671/6672]  eta: 0:00:00  lr: 0.001328  loss: 3.1456 (3.1287)  time: 0.7250  data: 0.0011  max mem: 30335
[17:42:05.687777] Epoch: [20] Total time: 1:22:53 (0.7454 s / it)
[17:42:05.738405] Averaged stats: lr: 0.001328  loss: 3.1456 (3.1246)
[17:42:10.154542] Test:  [   0/2084]  eta: 2:33:13  loss: 0.5254 (0.5254)  acc1: 83.3333 (83.3333)  acc5: 100.0000 (100.0000)  time: 4.4113  data: 3.6756  max mem: 30335
[17:44:18.368717] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.1584 (1.0872)  acc1: 66.6667 (72.3137)  acc5: 95.8333 (93.2219)  time: 0.2624  data: 0.0002  max mem: 30335
[17:46:26.517854] Test:  [1000/2084]  eta: 0:04:42  loss: 1.3057 (1.1547)  acc1: 66.6667 (71.1081)  acc5: 87.5000 (92.1204)  time: 0.2559  data: 0.0002  max mem: 30335
[17:48:34.926212] Test:  [1500/2084]  eta: 0:02:31  loss: 1.0367 (1.2757)  acc1: 75.0000 (68.8597)  acc5: 91.6667 (90.1094)  time: 0.2557  data: 0.0002  max mem: 30335
[17:50:42.858769] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7812 (1.3449)  acc1: 83.3333 (67.5225)  acc5: 95.8333 (89.0492)  time: 0.2558  data: 0.0002  max mem: 30335
[17:51:04.136254] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6225 (1.3469)  acc1: 87.5000 (67.4680)  acc5: 95.8333 (89.0460)  time: 0.2481  data: 0.0001  max mem: 30335
[17:51:04.247765] Test: Total time: 0:08:58 (0.2584 s / it)
[17:51:18.409870] * Acc@1 67.460 Acc@5 89.044 loss 1.347
[17:51:18.410168] Accuracy of the network on the 50000 test images: 67.5%
[17:51:18.410209] Max accuracy: 67.46%
[17:51:18.765699] log_dir: ./output_dir_qkformer
[17:51:24.574654] Epoch: [21]  [   0/6672]  eta: 10:26:55  lr: 0.001328  loss: 3.6167 (3.6167)  time: 5.6378  data: 2.1614  max mem: 30335
[18:16:19.674195] Epoch: [21]  [2000/6672]  eta: 0:58:23  lr: 0.001327  loss: 3.1308 (3.1081)  time: 0.7476  data: 0.0002  max mem: 30335
[18:41:23.649597] Epoch: [21]  [4000/6672]  eta: 0:33:26  lr: 0.001326  loss: 3.2835 (3.1068)  time: 0.7282  data: 0.0003  max mem: 30335
[19:06:38.598804] Epoch: [21]  [6000/6672]  eta: 0:08:26  lr: 0.001325  loss: 3.1285 (3.1057)  time: 0.7260  data: 0.0002  max mem: 30335
[19:15:01.118191] Epoch: [21]  [6671/6672]  eta: 0:00:00  lr: 0.001325  loss: 3.0989 (3.1087)  time: 0.7221  data: 0.0006  max mem: 30335
[19:15:01.970472] Epoch: [21] Total time: 1:23:43 (0.7529 s / it)
[19:15:02.026568] Averaged stats: lr: 0.001325  loss: 3.0989 (3.1089)
[19:15:06.143383] Test:  [   0/2084]  eta: 2:22:48  loss: 0.7765 (0.7765)  acc1: 79.1667 (79.1667)  acc5: 91.6667 (91.6667)  time: 4.1114  data: 3.3578  max mem: 30335
[19:17:14.260811] Test:  [ 500/2084]  eta: 0:06:58  loss: 1.2680 (1.0424)  acc1: 62.5000 (72.8377)  acc5: 95.8333 (93.1969)  time: 0.2552  data: 0.0002  max mem: 30335
[19:19:23.429108] Test:  [1000/2084]  eta: 0:04:43  loss: 1.3450 (1.0963)  acc1: 66.6667 (71.7324)  acc5: 87.5000 (92.5200)  time: 0.2561  data: 0.0002  max mem: 30335
[19:21:31.523132] Test:  [1500/2084]  eta: 0:02:31  loss: 1.5259 (1.2484)  acc1: 70.8333 (68.8152)  acc5: 87.5000 (90.3398)  time: 0.2551  data: 0.0002  max mem: 30335
[19:23:39.437995] Test:  [2000/2084]  eta: 0:00:21  loss: 0.6975 (1.3146)  acc1: 87.5000 (67.5433)  acc5: 91.6667 (89.2450)  time: 0.2564  data: 0.0002  max mem: 30335
[19:24:00.483074] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6445 (1.3160)  acc1: 87.5000 (67.4860)  acc5: 95.8333 (89.2540)  time: 0.2486  data: 0.0002  max mem: 30335
[19:24:00.621975] Test: Total time: 0:08:58 (0.2584 s / it)
[19:24:15.373538] * Acc@1 67.525 Acc@5 89.253 loss 1.316
[19:24:15.373855] Accuracy of the network on the 50000 test images: 67.5%
[19:24:15.373900] Max accuracy: 67.52%
[19:24:15.450723] log_dir: ./output_dir_qkformer
[19:24:26.350726] Epoch: [22]  [   0/6672]  eta: 20:05:57  lr: 0.001325  loss: 2.7933 (2.7933)  time: 10.8450  data: 2.8180  max mem: 30335
[19:49:26.380310] Epoch: [22]  [2000/6672]  eta: 0:58:46  lr: 0.001324  loss: 3.1034 (3.1026)  time: 0.7238  data: 0.0002  max mem: 30335
[20:14:33.142503] Epoch: [22]  [4000/6672]  eta: 0:33:34  lr: 0.001323  loss: 3.0722 (3.1006)  time: 0.7253  data: 0.0002  max mem: 30335
[20:39:34.849020] Epoch: [22]  [6000/6672]  eta: 0:08:25  lr: 0.001322  loss: 3.1746 (3.0992)  time: 0.7488  data: 0.0002  max mem: 30335
[20:48:00.220016] Epoch: [22]  [6671/6672]  eta: 0:00:00  lr: 0.001322  loss: 3.0262 (3.1009)  time: 0.7199  data: 0.0011  max mem: 30335
[20:48:01.095602] Epoch: [22] Total time: 1:23:45 (0.7532 s / it)
[20:48:01.144431] Averaged stats: lr: 0.001322  loss: 3.0262 (3.0957)
[20:48:05.666491] Test:  [   0/2084]  eta: 2:36:50  loss: 0.2823 (0.2823)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.5157  data: 3.5671  max mem: 30335
[20:50:13.853355] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.9295 (1.0201)  acc1: 79.1667 (74.0935)  acc5: 95.8333 (93.2136)  time: 0.2550  data: 0.0002  max mem: 30335
[20:52:21.754533] Test:  [1000/2084]  eta: 0:04:42  loss: 1.2705 (1.0767)  acc1: 75.0000 (72.6815)  acc5: 87.5000 (92.6074)  time: 0.2559  data: 0.0002  max mem: 30335
[20:54:31.842494] Test:  [1500/2084]  eta: 0:02:31  loss: 1.1974 (1.2190)  acc1: 70.8333 (69.7507)  acc5: 91.6667 (90.5646)  time: 0.2561  data: 0.0002  max mem: 30335
[20:56:41.835880] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7053 (1.2955)  acc1: 87.5000 (68.4637)  acc5: 91.6667 (89.4636)  time: 0.2557  data: 0.0002  max mem: 30335
[20:57:02.986857] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6816 (1.2954)  acc1: 83.3333 (68.4780)  acc5: 95.8333 (89.4560)  time: 0.2492  data: 0.0001  max mem: 30335
[20:57:03.111314] Test: Total time: 0:09:01 (0.2601 s / it)
[20:57:17.771153] * Acc@1 68.427 Acc@5 89.463 loss 1.295
[20:57:17.771504] Accuracy of the network on the 50000 test images: 68.4%
[20:57:17.771544] Max accuracy: 68.43%
[20:57:17.934646] log_dir: ./output_dir_qkformer
[20:57:23.832698] Epoch: [23]  [   0/6672]  eta: 10:55:40  lr: 0.001322  loss: 2.8403 (2.8403)  time: 5.8964  data: 3.3923  max mem: 30335
[21:22:31.580957] Epoch: [23]  [2000/6672]  eta: 0:58:53  lr: 0.001321  loss: 2.9559 (3.0810)  time: 0.7268  data: 0.0002  max mem: 30335
[21:47:38.702114] Epoch: [23]  [4000/6672]  eta: 0:33:36  lr: 0.001320  loss: 2.9110 (3.0814)  time: 0.7327  data: 0.0010  max mem: 30335
[22:12:52.153022] Epoch: [23]  [6000/6672]  eta: 0:08:27  lr: 0.001319  loss: 3.0221 (3.0803)  time: 0.7282  data: 0.0003  max mem: 30335
[22:21:13.906933] Epoch: [23]  [6671/6672]  eta: 0:00:00  lr: 0.001319  loss: 2.9572 (3.0820)  time: 0.7209  data: 0.0006  max mem: 30335
[22:21:14.767578] Epoch: [23] Total time: 1:23:56 (0.7549 s / it)
[22:21:14.803799] Averaged stats: lr: 0.001319  loss: 2.9572 (3.0790)
[22:21:19.035360] Test:  [   0/2084]  eta: 2:26:44  loss: 1.7842 (1.7842)  acc1: 62.5000 (62.5000)  acc5: 83.3333 (83.3333)  time: 4.2247  data: 3.5154  max mem: 30335
[22:23:27.427850] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.2803 (1.0560)  acc1: 62.5000 (73.2784)  acc5: 91.6667 (93.0472)  time: 0.2573  data: 0.0002  max mem: 30335
[22:25:35.551710] Test:  [1000/2084]  eta: 0:04:42  loss: 1.0920 (1.1167)  acc1: 75.0000 (71.9863)  acc5: 87.5000 (92.2203)  time: 0.2565  data: 0.0002  max mem: 30335
[22:27:44.025062] Test:  [1500/2084]  eta: 0:02:31  loss: 1.1889 (1.2638)  acc1: 75.0000 (68.9929)  acc5: 91.6667 (90.0483)  time: 0.2562  data: 0.0002  max mem: 30335
[22:29:52.148487] Test:  [2000/2084]  eta: 0:00:21  loss: 0.6807 (1.3407)  acc1: 83.3333 (67.4142)  acc5: 95.8333 (88.9035)  time: 0.2593  data: 0.0002  max mem: 30335
[22:30:13.282331] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5258 (1.3439)  acc1: 83.3333 (67.3440)  acc5: 95.8333 (88.8720)  time: 0.2491  data: 0.0002  max mem: 30335
[22:30:13.408717] Test: Total time: 0:08:58 (0.2584 s / it)
[22:30:28.082590] * Acc@1 67.327 Acc@5 88.898 loss 1.344
[22:30:28.083149] Accuracy of the network on the 50000 test images: 67.3%
[22:30:28.083183] Max accuracy: 68.43%
[22:30:28.407847] log_dir: ./output_dir_qkformer
[22:30:32.261337] Epoch: [24]  [   0/6672]  eta: 7:08:22  lr: 0.001319  loss: 2.4067 (2.4067)  time: 3.8523  data: 2.7612  max mem: 30335
[22:55:19.984826] Epoch: [24]  [2000/6672]  eta: 0:58:01  lr: 0.001318  loss: 3.0163 (3.0618)  time: 0.7275  data: 0.0002  max mem: 30335
[23:20:21.388207] Epoch: [24]  [4000/6672]  eta: 0:33:18  lr: 0.001317  loss: 2.9729 (3.0706)  time: 0.7247  data: 0.0002  max mem: 30335
[23:45:30.077446] Epoch: [24]  [6000/6672]  eta: 0:08:24  lr: 0.001316  loss: 3.0649 (3.0699)  time: 0.7245  data: 0.0003  max mem: 30335
[23:53:50.330313] Epoch: [24]  [6671/6672]  eta: 0:00:00  lr: 0.001315  loss: 3.1259 (3.0707)  time: 0.7222  data: 0.0007  max mem: 30335
[23:53:51.269043] Epoch: [24] Total time: 1:23:22 (0.7498 s / it)
[23:53:51.359655] Averaged stats: lr: 0.001315  loss: 3.1259 (3.0691)
[23:53:55.958815] Test:  [   0/2084]  eta: 2:39:22  loss: 0.7294 (0.7294)  acc1: 83.3333 (83.3333)  acc5: 87.5000 (87.5000)  time: 4.5887  data: 3.8958  max mem: 30335
[23:56:05.053837] Test:  [ 500/2084]  eta: 0:07:02  loss: 1.1541 (1.0669)  acc1: 62.5000 (73.3533)  acc5: 95.8333 (93.1803)  time: 0.2565  data: 0.0002  max mem: 30335
[23:58:13.110631] Test:  [1000/2084]  eta: 0:04:43  loss: 1.3652 (1.1008)  acc1: 70.8333 (72.7939)  acc5: 83.3333 (92.8114)  time: 0.2564  data: 0.0002  max mem: 30335
[00:00:21.673106] Test:  [1500/2084]  eta: 0:02:31  loss: 1.1744 (1.2374)  acc1: 75.0000 (69.9450)  acc5: 91.6667 (90.8228)  time: 0.2554  data: 0.0002  max mem: 30335
[00:02:29.917740] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5569 (1.2976)  acc1: 87.5000 (68.8198)  acc5: 95.8333 (89.9113)  time: 0.2556  data: 0.0002  max mem: 30335
[00:02:50.994708] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7524 (1.2949)  acc1: 83.3333 (68.8960)  acc5: 95.8333 (89.9840)  time: 0.2484  data: 0.0001  max mem: 30335
[00:02:51.115591] Test: Total time: 0:08:59 (0.2590 s / it)
[00:03:05.606457] * Acc@1 68.890 Acc@5 89.985 loss 1.295
[00:03:05.606686] Accuracy of the network on the 50000 test images: 68.9%
[00:03:05.606736] Max accuracy: 68.89%
[00:03:05.735882] log_dir: ./output_dir_qkformer
[00:03:10.736720] Epoch: [25]  [   0/6672]  eta: 9:16:00  lr: 0.001315  loss: 2.6062 (2.6062)  time: 5.0001  data: 2.8035  max mem: 30335
[00:28:32.391873] Epoch: [25]  [2000/6672]  eta: 0:59:23  lr: 0.001314  loss: 2.8830 (3.0444)  time: 0.7266  data: 0.0002  max mem: 30335
[00:53:42.895519] Epoch: [25]  [4000/6672]  eta: 0:33:48  lr: 0.001313  loss: 2.9722 (3.0496)  time: 0.7361  data: 0.0002  max mem: 30335
[01:18:47.310556] Epoch: [25]  [6000/6672]  eta: 0:08:28  lr: 0.001312  loss: 3.0507 (3.0489)  time: 0.7334  data: 0.0002  max mem: 30335
[01:27:08.538019] Epoch: [25]  [6671/6672]  eta: 0:00:00  lr: 0.001312  loss: 3.2177 (3.0482)  time: 0.7237  data: 0.0011  max mem: 30335
[01:27:09.332374] Epoch: [25] Total time: 1:24:03 (0.7559 s / it)
[01:27:09.389133] Averaged stats: lr: 0.001312  loss: 3.2177 (3.0527)
[01:27:13.211763] Test:  [   0/2084]  eta: 2:12:34  loss: 0.6349 (0.6349)  acc1: 79.1667 (79.1667)  acc5: 95.8333 (95.8333)  time: 3.8170  data: 3.2163  max mem: 30335
[01:29:21.412302] Test:  [ 500/2084]  eta: 0:06:57  loss: 1.0888 (1.0668)  acc1: 75.0000 (72.9874)  acc5: 95.8333 (93.2219)  time: 0.2571  data: 0.0002  max mem: 30335
[01:31:29.623956] Test:  [1000/2084]  eta: 0:04:41  loss: 1.0519 (1.0892)  acc1: 75.0000 (72.2736)  acc5: 87.5000 (92.8571)  time: 0.2564  data: 0.0002  max mem: 30335
[01:33:38.243664] Test:  [1500/2084]  eta: 0:02:31  loss: 1.1591 (1.2103)  acc1: 75.0000 (69.9034)  acc5: 87.5000 (90.9893)  time: 0.2552  data: 0.0002  max mem: 30335
[01:35:47.847861] Test:  [2000/2084]  eta: 0:00:21  loss: 0.6487 (1.2632)  acc1: 83.3333 (68.9239)  acc5: 95.8333 (90.1299)  time: 0.2571  data: 0.0002  max mem: 30335
[01:36:09.011550] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5845 (1.2693)  acc1: 87.5000 (68.8100)  acc5: 95.8333 (90.0420)  time: 0.2503  data: 0.0002  max mem: 30335
[01:36:09.145181] Test: Total time: 0:08:59 (0.2590 s / it)
[01:36:24.803998] * Acc@1 68.799 Acc@5 90.020 loss 1.269
[01:36:24.804292] Accuracy of the network on the 50000 test images: 68.8%
[01:36:24.804328] Max accuracy: 68.89%
[01:36:25.084102] log_dir: ./output_dir_qkformer
[01:36:30.405811] Epoch: [26]  [   0/6672]  eta: 9:51:33  lr: 0.001312  loss: 3.1832 (3.1832)  time: 5.3198  data: 1.9267  max mem: 30335
[02:01:25.743319] Epoch: [26]  [2000/6672]  eta: 0:58:23  lr: 0.001311  loss: 3.1274 (3.0279)  time: 0.7283  data: 0.0002  max mem: 30335
[02:26:28.017799] Epoch: [26]  [4000/6672]  eta: 0:33:25  lr: 0.001310  loss: 3.0405 (3.0330)  time: 0.7318  data: 0.0003  max mem: 30335
[02:51:31.416396] Epoch: [26]  [6000/6672]  eta: 0:08:24  lr: 0.001308  loss: 2.8259 (3.0379)  time: 0.7294  data: 0.0003  max mem: 30335
[03:00:06.285070] Epoch: [26]  [6671/6672]  eta: 0:00:00  lr: 0.001308  loss: 3.0149 (3.0366)  time: 0.7256  data: 0.0010  max mem: 30335
[03:00:07.153548] Epoch: [26] Total time: 1:23:42 (0.7527 s / it)
[03:00:07.174162] Averaged stats: lr: 0.001308  loss: 3.0149 (3.0398)
[03:00:10.936199] Test:  [   0/2084]  eta: 2:10:16  loss: 0.3486 (0.3486)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 3.7507  data: 3.2007  max mem: 30335
[03:02:19.514238] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.8792 (1.0030)  acc1: 79.1667 (74.9834)  acc5: 91.6667 (93.4464)  time: 0.2558  data: 0.0002  max mem: 30335
[03:04:27.853803] Test:  [1000/2084]  eta: 0:04:42  loss: 1.0569 (1.0587)  acc1: 66.6667 (73.3641)  acc5: 83.3333 (92.8238)  time: 0.2584  data: 0.0002  max mem: 30335
[03:06:36.239826] Test:  [1500/2084]  eta: 0:02:31  loss: 0.7722 (1.1936)  acc1: 75.0000 (70.7362)  acc5: 91.6667 (90.8894)  time: 0.2563  data: 0.0002  max mem: 30335
[03:08:44.745526] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7118 (1.2574)  acc1: 87.5000 (69.4549)  acc5: 95.8333 (89.9988)  time: 0.2571  data: 0.0004  max mem: 30335
[03:09:05.870375] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6759 (1.2579)  acc1: 83.3333 (69.4180)  acc5: 95.8333 (90.0020)  time: 0.2484  data: 0.0001  max mem: 30335
[03:09:06.014446] Test: Total time: 0:08:58 (0.2586 s / it)
[03:09:21.232759] * Acc@1 69.412 Acc@5 90.012 loss 1.258
[03:09:21.233247] Accuracy of the network on the 50000 test images: 69.4%
[03:09:21.233291] Max accuracy: 69.41%
[03:09:21.346342] log_dir: ./output_dir_qkformer
[03:09:33.914782] Epoch: [27]  [   0/6672]  eta: 23:13:29  lr: 0.001308  loss: 3.3705 (3.3705)  time: 12.5315  data: 2.4590  max mem: 30335
[03:34:36.837449] Epoch: [27]  [2000/6672]  eta: 0:58:57  lr: 0.001307  loss: 3.1194 (3.0254)  time: 0.7282  data: 0.0003  max mem: 30335
[03:59:38.422357] Epoch: [27]  [4000/6672]  eta: 0:33:34  lr: 0.001306  loss: 3.0094 (3.0273)  time: 0.7247  data: 0.0002  max mem: 30335
[04:24:36.019795] Epoch: [27]  [6000/6672]  eta: 0:08:25  lr: 0.001305  loss: 3.0115 (3.0299)  time: 0.7319  data: 0.0002  max mem: 30335
[04:33:00.421994] Epoch: [27]  [6671/6672]  eta: 0:00:00  lr: 0.001304  loss: 3.0451 (3.0283)  time: 0.7236  data: 0.0011  max mem: 30335
[04:33:01.283564] Epoch: [27] Total time: 1:23:39 (0.7524 s / it)
[04:33:01.327524] Averaged stats: lr: 0.001304  loss: 3.0451 (3.0299)
[04:33:05.622554] Test:  [   0/2084]  eta: 2:28:59  loss: 0.2695 (0.2695)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.2898  data: 3.5984  max mem: 30335
[04:35:14.636130] Test:  [ 500/2084]  eta: 0:07:01  loss: 1.2758 (1.0012)  acc1: 54.1667 (74.2681)  acc5: 95.8333 (93.6294)  time: 0.2567  data: 0.0002  max mem: 30335
[04:37:24.380490] Test:  [1000/2084]  eta: 0:04:44  loss: 0.9873 (1.0212)  acc1: 79.1667 (73.7929)  acc5: 87.5000 (93.4482)  time: 0.2589  data: 0.0025  max mem: 30335
[04:39:33.503829] Test:  [1500/2084]  eta: 0:02:32  loss: 1.3228 (1.1541)  acc1: 70.8333 (71.1637)  acc5: 91.6667 (91.4335)  time: 0.2590  data: 0.0002  max mem: 30335
[04:41:42.468865] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7724 (1.2120)  acc1: 83.3333 (69.9609)  acc5: 91.6667 (90.5297)  time: 0.2567  data: 0.0004  max mem: 30335
[04:42:03.562000] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8858 (1.2136)  acc1: 79.1667 (69.8620)  acc5: 95.8333 (90.5400)  time: 0.2480  data: 0.0001  max mem: 30335
[04:42:03.692665] Test: Total time: 0:09:02 (0.2603 s / it)
[04:42:18.853617] * Acc@1 69.890 Acc@5 90.536 loss 1.213
[04:42:18.854097] Accuracy of the network on the 50000 test images: 69.9%
[04:42:18.854143] Max accuracy: 69.89%
[04:42:18.964844] log_dir: ./output_dir_qkformer
[04:42:25.716914] Epoch: [28]  [   0/6672]  eta: 12:30:42  lr: 0.001304  loss: 3.4385 (3.4385)  time: 6.7509  data: 2.2784  max mem: 30335
[05:07:25.513225] Epoch: [28]  [2000/6672]  eta: 0:58:36  lr: 0.001303  loss: 3.0378 (3.0006)  time: 0.7316  data: 0.0002  max mem: 30335
[05:32:26.058716] Epoch: [28]  [4000/6672]  eta: 0:33:27  lr: 0.001302  loss: 2.9271 (3.0024)  time: 0.7308  data: 0.0007  max mem: 30335
[05:57:22.755682] Epoch: [28]  [6000/6672]  eta: 0:08:24  lr: 0.001301  loss: 2.9910 (3.0047)  time: 0.9308  data: 0.0002  max mem: 30335
[06:05:41.791939] Epoch: [28]  [6671/6672]  eta: 0:00:00  lr: 0.001300  loss: 3.1104 (3.0041)  time: 0.7240  data: 0.0006  max mem: 30335
[06:05:42.698358] Epoch: [28] Total time: 1:23:23 (0.7500 s / it)
[06:05:42.720583] Averaged stats: lr: 0.001300  loss: 3.1104 (3.0186)
[06:05:46.518206] Test:  [   0/2084]  eta: 2:11:45  loss: 0.9524 (0.9524)  acc1: 83.3333 (83.3333)  acc5: 91.6667 (91.6667)  time: 3.7933  data: 3.1213  max mem: 30335
[06:07:54.911498] Test:  [ 500/2084]  eta: 0:06:57  loss: 1.0238 (1.0031)  acc1: 75.0000 (74.3929)  acc5: 95.8333 (94.0286)  time: 0.2622  data: 0.0002  max mem: 30335
[06:10:04.753851] Test:  [1000/2084]  eta: 0:04:43  loss: 1.1366 (1.0692)  acc1: 75.0000 (73.1352)  acc5: 87.5000 (93.0403)  time: 0.2573  data: 0.0002  max mem: 30335
[06:12:13.046569] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9924 (1.1990)  acc1: 79.1667 (70.6251)  acc5: 91.6667 (91.0560)  time: 0.2567  data: 0.0002  max mem: 30335
[06:14:22.918622] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4486 (1.2423)  acc1: 87.5000 (69.9463)  acc5: 95.8333 (90.3111)  time: 0.2572  data: 0.0002  max mem: 30335
[06:14:44.347303] Test:  [2083/2084]  eta: 0:00:00  loss: 0.8148 (1.2435)  acc1: 83.3333 (69.9220)  acc5: 95.8333 (90.3280)  time: 0.2502  data: 0.0001  max mem: 30335
[06:14:44.482480] Test: Total time: 0:09:01 (0.2600 s / it)
[06:15:00.393745] * Acc@1 69.916 Acc@5 90.331 loss 1.244
[06:15:00.394194] Accuracy of the network on the 50000 test images: 69.9%
[06:15:00.394265] Max accuracy: 69.92%
[06:15:00.576855] log_dir: ./output_dir_qkformer
[06:15:10.527792] Epoch: [29]  [   0/6672]  eta: 18:26:18  lr: 0.001300  loss: 3.1109 (3.1109)  time: 9.9489  data: 2.4454  max mem: 30335
[06:40:12.203709] Epoch: [29]  [2000/6672]  eta: 0:58:49  lr: 0.001299  loss: 3.0777 (2.9924)  time: 0.7808  data: 0.0002  max mem: 30335
[07:05:17.104414] Epoch: [29]  [4000/6672]  eta: 0:33:34  lr: 0.001298  loss: 2.9998 (3.0043)  time: 0.7275  data: 0.0002  max mem: 30335
[07:30:31.554023] Epoch: [29]  [6000/6672]  eta: 0:08:27  lr: 0.001296  loss: 2.9473 (3.0107)  time: 0.7365  data: 0.0002  max mem: 30335
[07:38:46.464360] Epoch: [29]  [6671/6672]  eta: 0:00:00  lr: 0.001296  loss: 3.1378 (3.0104)  time: 0.7248  data: 0.0006  max mem: 30335
[07:38:47.325203] Epoch: [29] Total time: 1:23:46 (0.7534 s / it)
[07:38:47.373171] Averaged stats: lr: 0.001296  loss: 3.1378 (3.0101)
[07:38:51.453298] Test:  [   0/2084]  eta: 2:21:29  loss: 0.7056 (0.7056)  acc1: 87.5000 (87.5000)  acc5: 91.6667 (91.6667)  time: 4.0738  data: 3.4190  max mem: 30335
[07:41:00.184038] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.1197 (0.9979)  acc1: 66.6667 (74.9834)  acc5: 95.8333 (93.7542)  time: 0.2567  data: 0.0002  max mem: 30335
[07:43:08.799068] Test:  [1000/2084]  eta: 0:04:43  loss: 1.0620 (1.0620)  acc1: 79.1667 (73.3475)  acc5: 87.5000 (92.9321)  time: 0.2562  data: 0.0002  max mem: 30335
[07:45:18.001867] Test:  [1500/2084]  eta: 0:02:31  loss: 1.1874 (1.2032)  acc1: 70.8333 (70.3920)  acc5: 91.6667 (91.0449)  time: 0.2560  data: 0.0002  max mem: 30335
[07:47:27.353116] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5583 (1.2632)  acc1: 87.5000 (69.2883)  acc5: 95.8333 (90.1174)  time: 0.2571  data: 0.0002  max mem: 30335
[07:47:48.433462] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7541 (1.2594)  acc1: 83.3333 (69.4040)  acc5: 95.8333 (90.1420)  time: 0.2480  data: 0.0001  max mem: 30335
[07:47:48.533238] Test: Total time: 0:09:01 (0.2597 s / it)
[07:48:04.338967] * Acc@1 69.427 Acc@5 90.138 loss 1.259
[07:48:04.339358] Accuracy of the network on the 50000 test images: 69.4%
[07:48:04.339390] Max accuracy: 69.92%
[07:48:04.644433] log_dir: ./output_dir_qkformer
[07:48:08.921231] Epoch: [30]  [   0/6672]  eta: 7:54:49  lr: 0.001296  loss: 2.9768 (2.9768)  time: 4.2701  data: 1.9002  max mem: 30335
[08:13:10.923236] Epoch: [30]  [2000/6672]  eta: 0:58:36  lr: 0.001295  loss: 2.7694 (2.9839)  time: 0.7302  data: 0.0002  max mem: 30335
[08:38:07.313053] Epoch: [30]  [4000/6672]  eta: 0:33:24  lr: 0.001293  loss: 3.0217 (2.9846)  time: 0.7288  data: 0.0002  max mem: 30335
[09:03:02.188985] Epoch: [30]  [6000/6672]  eta: 0:08:23  lr: 0.001292  loss: 3.1768 (2.9974)  time: 0.7289  data: 0.0002  max mem: 30335
[09:11:25.241450] Epoch: [30]  [6671/6672]  eta: 0:00:00  lr: 0.001292  loss: 2.9571 (2.9985)  time: 0.7264  data: 0.0007  max mem: 30335
[09:11:25.992606] Epoch: [30] Total time: 1:23:21 (0.7496 s / it)
[09:11:26.035341] Averaged stats: lr: 0.001292  loss: 2.9571 (3.0005)
[09:11:30.236171] Test:  [   0/2084]  eta: 2:25:39  loss: 0.4536 (0.4536)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.1934  data: 3.5220  max mem: 30335
[09:13:38.468847] Test:  [ 500/2084]  eta: 0:06:58  loss: 1.0547 (0.9800)  acc1: 66.6667 (75.0333)  acc5: 95.8333 (93.8540)  time: 0.2569  data: 0.0002  max mem: 30335
[09:15:47.894367] Test:  [1000/2084]  eta: 0:04:43  loss: 1.0935 (1.0263)  acc1: 75.0000 (74.1842)  acc5: 87.5000 (93.3025)  time: 0.2566  data: 0.0002  max mem: 30335
[09:17:58.236290] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9568 (1.1547)  acc1: 75.0000 (71.3885)  acc5: 91.6667 (91.6195)  time: 0.2564  data: 0.0002  max mem: 30335
[09:20:06.536877] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5507 (1.2087)  acc1: 87.5000 (70.2253)  acc5: 95.8333 (90.7296)  time: 0.2560  data: 0.0002  max mem: 30335
[09:20:27.690656] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6651 (1.2091)  acc1: 83.3333 (70.1840)  acc5: 95.8333 (90.7640)  time: 0.2490  data: 0.0001  max mem: 30335
[09:20:27.806006] Test: Total time: 0:09:01 (0.2600 s / it)
[09:20:43.213240] * Acc@1 70.186 Acc@5 90.762 loss 1.209
[09:20:43.213487] Accuracy of the network on the 50000 test images: 70.2%
[09:20:43.213535] Max accuracy: 70.19%
[09:20:43.290984] log_dir: ./output_dir_qkformer
[09:20:47.370006] Epoch: [31]  [   0/6672]  eta: 7:30:37  lr: 0.001292  loss: 2.2483 (2.2483)  time: 4.0524  data: 1.7968  max mem: 30335
[09:45:52.626710] Epoch: [31]  [2000/6672]  eta: 0:58:43  lr: 0.001290  loss: 2.9779 (2.9856)  time: 0.7265  data: 0.0002  max mem: 30335
[10:10:59.104719] Epoch: [31]  [4000/6672]  eta: 0:33:33  lr: 0.001289  loss: 3.1506 (2.9869)  time: 0.7248  data: 0.0002  max mem: 30335
[10:36:07.880382] Epoch: [31]  [6000/6672]  eta: 0:08:26  lr: 0.001288  loss: 2.9565 (2.9917)  time: 0.8622  data: 0.0003  max mem: 30335
[10:44:28.464468] Epoch: [31]  [6671/6672]  eta: 0:00:00  lr: 0.001287  loss: 2.8857 (2.9912)  time: 0.7267  data: 0.0011  max mem: 30335
[10:44:29.269846] Epoch: [31] Total time: 1:23:45 (0.7533 s / it)
[10:44:29.322091] Averaged stats: lr: 0.001287  loss: 2.8857 (2.9874)
[10:44:33.400197] Test:  [   0/2084]  eta: 2:21:30  loss: 0.3468 (0.3468)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.0740  data: 3.4024  max mem: 30335
[10:46:42.545028] Test:  [ 500/2084]  eta: 0:07:01  loss: 1.1775 (0.9402)  acc1: 66.6667 (76.1643)  acc5: 95.8333 (94.4112)  time: 0.2575  data: 0.0002  max mem: 30335
[10:48:50.916814] Test:  [1000/2084]  eta: 0:04:43  loss: 1.4108 (0.9876)  acc1: 75.0000 (74.6878)  acc5: 87.5000 (93.9769)  time: 0.2561  data: 0.0002  max mem: 30335
[10:50:59.179103] Test:  [1500/2084]  eta: 0:02:31  loss: 1.1103 (1.0966)  acc1: 70.8333 (72.7321)  acc5: 91.6667 (92.3384)  time: 0.2561  data: 0.0002  max mem: 30335
[10:53:07.369154] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5793 (1.1760)  acc1: 87.5000 (71.0187)  acc5: 95.8333 (91.0920)  time: 0.2558  data: 0.0004  max mem: 30335
[10:53:28.539596] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5322 (1.1772)  acc1: 83.3333 (71.0180)  acc5: 100.0000 (91.0860)  time: 0.2482  data: 0.0001  max mem: 30335
[10:53:28.703268] Test: Total time: 0:08:59 (0.2588 s / it)
[10:53:44.106034] * Acc@1 71.046 Acc@5 91.075 loss 1.177
[10:53:44.106278] Accuracy of the network on the 50000 test images: 71.0%
[10:53:44.106326] Max accuracy: 71.05%
[10:53:44.279483] log_dir: ./output_dir_qkformer
[10:53:47.550223] Epoch: [32]  [   0/6672]  eta: 6:03:28  lr: 0.001287  loss: 2.9532 (2.9532)  time: 3.2687  data: 2.1099  max mem: 30335
[11:18:46.566709] Epoch: [32]  [2000/6672]  eta: 0:58:26  lr: 0.001286  loss: 3.0642 (2.9671)  time: 0.7317  data: 0.0002  max mem: 30335
[11:44:05.024198] Epoch: [32]  [4000/6672]  eta: 0:33:36  lr: 0.001284  loss: 2.8527 (2.9801)  time: 0.7303  data: 0.0002  max mem: 30335
[12:09:17.110485] Epoch: [32]  [6000/6672]  eta: 0:08:27  lr: 0.001283  loss: 2.9087 (2.9814)  time: 0.7281  data: 0.0003  max mem: 30335
[12:17:37.777233] Epoch: [32]  [6671/6672]  eta: 0:00:00  lr: 0.001283  loss: 3.0210 (2.9816)  time: 0.7255  data: 0.0006  max mem: 30335
[12:17:38.610277] Epoch: [32] Total time: 1:23:54 (0.7545 s / it)
[12:17:38.637316] Averaged stats: lr: 0.001283  loss: 3.0210 (2.9804)
[12:17:42.822934] Test:  [   0/2084]  eta: 2:25:12  loss: 0.1995 (0.1995)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.1807  data: 3.5138  max mem: 30335
[12:19:51.129782] Test:  [ 500/2084]  eta: 0:06:58  loss: 1.0741 (0.9406)  acc1: 70.8333 (76.1228)  acc5: 95.8333 (94.2615)  time: 0.2559  data: 0.0002  max mem: 30335
[12:21:59.393986] Test:  [1000/2084]  eta: 0:04:42  loss: 0.9296 (0.9957)  acc1: 70.8333 (74.6462)  acc5: 87.5000 (93.6896)  time: 0.2573  data: 0.0002  max mem: 30335
[12:24:07.795277] Test:  [1500/2084]  eta: 0:02:31  loss: 1.0450 (1.1077)  acc1: 79.1667 (72.3712)  acc5: 91.6667 (92.1302)  time: 0.2557  data: 0.0002  max mem: 30335
[12:26:17.569914] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5807 (1.1648)  acc1: 87.5000 (71.1290)  acc5: 95.8333 (91.2669)  time: 0.2562  data: 0.0002  max mem: 30335
[12:26:38.679095] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7437 (1.1654)  acc1: 79.1667 (71.1060)  acc5: 95.8333 (91.2840)  time: 0.2478  data: 0.0001  max mem: 30335
[12:26:38.796977] Test: Total time: 0:09:00 (0.2592 s / it)
[12:26:54.029487] * Acc@1 71.112 Acc@5 91.296 loss 1.165
[12:26:54.029739] Accuracy of the network on the 50000 test images: 71.1%
[12:26:54.029771] Max accuracy: 71.11%
[12:26:54.144344] log_dir: ./output_dir_qkformer
[12:26:57.337713] Epoch: [33]  [   0/6672]  eta: 5:53:35  lr: 0.001283  loss: 2.6482 (2.6482)  time: 3.1797  data: 2.3365  max mem: 30335
[12:52:04.341793] Epoch: [33]  [2000/6672]  eta: 0:58:45  lr: 0.001281  loss: 3.0511 (2.9669)  time: 0.7320  data: 0.0002  max mem: 30335
[13:17:25.127390] Epoch: [33]  [4000/6672]  eta: 0:33:43  lr: 0.001280  loss: 2.7950 (2.9601)  time: 0.7285  data: 0.0002  max mem: 30335
[13:42:35.153541] Epoch: [33]  [6000/6672]  eta: 0:08:28  lr: 0.001278  loss: 2.8251 (2.9670)  time: 0.7313  data: 0.0003  max mem: 30335
[13:50:56.821117] Epoch: [33]  [6671/6672]  eta: 0:00:00  lr: 0.001278  loss: 2.8588 (2.9675)  time: 0.7459  data: 0.0012  max mem: 30335
[13:50:57.787564] Epoch: [33] Total time: 1:24:03 (0.7559 s / it)
[13:50:57.852106] Averaged stats: lr: 0.001278  loss: 2.8588 (2.9708)
[13:51:02.669467] Test:  [   0/2084]  eta: 2:47:05  loss: 0.7318 (0.7318)  acc1: 83.3333 (83.3333)  acc5: 95.8333 (95.8333)  time: 4.8108  data: 3.7393  max mem: 30335
[13:53:11.316519] Test:  [ 500/2084]  eta: 0:07:01  loss: 1.0205 (0.9768)  acc1: 70.8333 (76.0645)  acc5: 100.0000 (93.8207)  time: 0.2577  data: 0.0002  max mem: 30335
[13:55:20.489053] Test:  [1000/2084]  eta: 0:04:44  loss: 0.9613 (1.0074)  acc1: 70.8333 (75.0083)  acc5: 91.6667 (93.6147)  time: 0.2572  data: 0.0002  max mem: 30335
[13:57:30.301362] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9588 (1.1211)  acc1: 75.0000 (72.3906)  acc5: 91.6667 (91.9526)  time: 0.2569  data: 0.0002  max mem: 30335
[13:59:38.640711] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5032 (1.1733)  acc1: 87.5000 (71.2269)  acc5: 100.0000 (91.2023)  time: 0.2577  data: 0.0002  max mem: 30335
[13:59:59.895300] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5952 (1.1746)  acc1: 87.5000 (71.1800)  acc5: 100.0000 (91.2200)  time: 0.2548  data: 0.0001  max mem: 30335
[14:00:00.027495] Test: Total time: 0:09:02 (0.2602 s / it)
[14:00:15.759868] * Acc@1 71.200 Acc@5 91.246 loss 1.174
[14:00:15.760169] Accuracy of the network on the 50000 test images: 71.2%
[14:00:15.760203] Max accuracy: 71.20%
[14:00:16.036502] log_dir: ./output_dir_qkformer
[14:00:19.713133] Epoch: [34]  [   0/6672]  eta: 6:48:42  lr: 0.001278  loss: 3.8683 (3.8683)  time: 3.6754  data: 2.1792  max mem: 30335
[14:25:26.513596] Epoch: [34]  [2000/6672]  eta: 0:58:46  lr: 0.001276  loss: 2.9237 (2.9611)  time: 0.8566  data: 0.0002  max mem: 30335
[14:50:22.242917] Epoch: [34]  [4000/6672]  eta: 0:33:27  lr: 0.001275  loss: 2.8963 (2.9678)  time: 0.7301  data: 0.0003  max mem: 30335
[15:15:19.037553] Epoch: [34]  [6000/6672]  eta: 0:08:24  lr: 0.001273  loss: 3.0781 (2.9629)  time: 0.7295  data: 0.0003  max mem: 30335
[15:23:42.802863] Epoch: [34]  [6671/6672]  eta: 0:00:00  lr: 0.001273  loss: 2.9686 (2.9629)  time: 0.7251  data: 0.0011  max mem: 30335
[15:23:43.691946] Epoch: [34] Total time: 1:23:27 (0.7505 s / it)
[15:23:43.719827] Averaged stats: lr: 0.001273  loss: 2.9686 (2.9637)
[15:23:48.094846] Test:  [   0/2084]  eta: 2:31:47  loss: 0.7727 (0.7727)  acc1: 83.3333 (83.3333)  acc5: 100.0000 (100.0000)  time: 4.3703  data: 3.5415  max mem: 30335
[15:25:57.095435] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.9446 (0.9895)  acc1: 66.6667 (75.0998)  acc5: 95.8333 (93.7458)  time: 0.2567  data: 0.0002  max mem: 30335
[15:28:05.905691] Test:  [1000/2084]  eta: 0:04:43  loss: 1.1368 (1.0382)  acc1: 75.0000 (73.8012)  acc5: 87.5000 (93.2359)  time: 0.2575  data: 0.0002  max mem: 30335
[15:30:15.186373] Test:  [1500/2084]  eta: 0:02:32  loss: 1.4818 (1.1552)  acc1: 62.5000 (71.4080)  acc5: 87.5000 (91.5584)  time: 0.2561  data: 0.0002  max mem: 30335
[15:32:23.646444] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7523 (1.2093)  acc1: 83.3333 (70.3961)  acc5: 95.8333 (90.7671)  time: 0.2565  data: 0.0002  max mem: 30335
[15:32:44.807633] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5569 (1.2107)  acc1: 87.5000 (70.3840)  acc5: 95.8333 (90.7620)  time: 0.2486  data: 0.0001  max mem: 30335
[15:32:44.933296] Test: Total time: 0:09:01 (0.2597 s / it)
[15:33:00.352575] * Acc@1 70.387 Acc@5 90.759 loss 1.211
[15:33:00.352846] Accuracy of the network on the 50000 test images: 70.4%
[15:33:00.352879] Max accuracy: 71.20%
[15:33:00.464341] log_dir: ./output_dir_qkformer
[15:33:12.131971] Epoch: [35]  [   0/6672]  eta: 21:37:13  lr: 0.001273  loss: 3.4704 (3.4704)  time: 11.6657  data: 3.4222  max mem: 30335
[15:58:17.316093] Epoch: [35]  [2000/6672]  eta: 0:59:01  lr: 0.001271  loss: 2.6645 (2.9324)  time: 0.7280  data: 0.0003  max mem: 30335
[16:23:27.346301] Epoch: [35]  [4000/6672]  eta: 0:33:41  lr: 0.001270  loss: 2.8073 (2.9393)  time: 0.7308  data: 0.0002  max mem: 30335
[16:48:24.637012] Epoch: [35]  [6000/6672]  eta: 0:08:26  lr: 0.001268  loss: 3.1641 (2.9489)  time: 0.7318  data: 0.0002  max mem: 30335
[16:56:56.704815] Epoch: [35]  [6671/6672]  eta: 0:00:00  lr: 0.001268  loss: 2.8885 (2.9524)  time: 0.7320  data: 0.0011  max mem: 30335
[16:56:57.628617] Epoch: [35] Total time: 1:23:57 (0.7550 s / it)
[16:56:57.660834] Averaged stats: lr: 0.001268  loss: 2.8885 (2.9585)
[16:57:01.549019] Test:  [   0/2084]  eta: 2:14:30  loss: 0.4898 (0.4898)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.8727  data: 3.2517  max mem: 30335
[16:59:11.564413] Test:  [ 500/2084]  eta: 0:07:03  loss: 1.0231 (0.9880)  acc1: 70.8333 (75.3992)  acc5: 95.8333 (93.9538)  time: 0.2567  data: 0.0002  max mem: 30335
[17:01:21.161300] Test:  [1000/2084]  eta: 0:04:45  loss: 1.2016 (1.0245)  acc1: 66.6667 (74.3756)  acc5: 87.5000 (93.4316)  time: 0.3089  data: 0.0002  max mem: 30335
[17:03:30.076060] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9992 (1.1523)  acc1: 70.8333 (71.7605)  acc5: 91.6667 (91.7388)  time: 0.2560  data: 0.0002  max mem: 30335
[17:05:38.349742] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5210 (1.2018)  acc1: 83.3333 (70.7292)  acc5: 95.8333 (90.9566)  time: 0.2563  data: 0.0002  max mem: 30335
[17:05:59.504270] Test:  [2083/2084]  eta: 0:00:00  loss: 0.9487 (1.2059)  acc1: 79.1667 (70.6340)  acc5: 95.8333 (90.9200)  time: 0.2490  data: 0.0001  max mem: 30335
[17:05:59.655742] Test: Total time: 0:09:01 (0.2601 s / it)
[17:06:15.175124] * Acc@1 70.621 Acc@5 90.909 loss 1.206
[17:06:15.175412] Accuracy of the network on the 50000 test images: 70.6%
[17:06:15.175447] Max accuracy: 71.20%
[17:06:15.314637] log_dir: ./output_dir_qkformer
[17:06:21.560610] Epoch: [36]  [   0/6672]  eta: 11:34:21  lr: 0.001268  loss: 2.2785 (2.2785)  time: 6.2442  data: 2.6679  max mem: 30335
[17:31:24.392580] Epoch: [36]  [2000/6672]  eta: 0:58:42  lr: 0.001266  loss: 3.0645 (2.9393)  time: 0.7300  data: 0.0002  max mem: 30335
[17:56:54.137591] Epoch: [36]  [4000/6672]  eta: 0:33:48  lr: 0.001264  loss: 2.7905 (2.9451)  time: 1.0166  data: 0.0002  max mem: 30335
[18:22:09.351406] Epoch: [36]  [6000/6672]  eta: 0:08:29  lr: 0.001263  loss: 3.0150 (2.9529)  time: 0.7892  data: 0.0097  max mem: 30335
[18:30:26.369078] Epoch: [36]  [6671/6672]  eta: 0:00:00  lr: 0.001262  loss: 2.9313 (2.9515)  time: 0.7260  data: 0.0006  max mem: 30335
[18:30:27.123214] Epoch: [36] Total time: 1:24:11 (0.7572 s / it)
[18:30:27.157500] Averaged stats: lr: 0.001262  loss: 2.9313 (2.9504)
[18:30:31.278564] Test:  [   0/2084]  eta: 2:22:48  loss: 0.4169 (0.4169)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.1116  data: 3.3387  max mem: 30335
[18:32:40.128638] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.8712 (0.9224)  acc1: 75.0000 (76.1228)  acc5: 95.8333 (94.4444)  time: 0.2764  data: 0.0090  max mem: 30335
[18:34:48.494462] Test:  [1000/2084]  eta: 0:04:42  loss: 1.1844 (0.9637)  acc1: 75.0000 (75.3164)  acc5: 91.6667 (93.9519)  time: 0.2568  data: 0.0002  max mem: 30335
[18:36:56.870752] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8042 (1.0839)  acc1: 83.3333 (72.7126)  acc5: 91.6667 (92.2080)  time: 0.2561  data: 0.0002  max mem: 30335
[18:39:07.948924] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5986 (1.1498)  acc1: 87.5000 (71.4643)  acc5: 95.8333 (91.2148)  time: 0.2570  data: 0.0002  max mem: 30335
[18:39:29.125106] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7337 (1.1539)  acc1: 87.5000 (71.4040)  acc5: 95.8333 (91.1780)  time: 0.2485  data: 0.0001  max mem: 30335
[18:39:29.239433] Test: Total time: 0:09:02 (0.2601 s / it)
[18:39:44.426667] * Acc@1 71.409 Acc@5 91.166 loss 1.154
[18:39:44.426936] Accuracy of the network on the 50000 test images: 71.4%
[18:39:44.426973] Max accuracy: 71.41%
[18:39:44.526735] log_dir: ./output_dir_qkformer
[18:39:49.569669] Epoch: [37]  [   0/6672]  eta: 9:20:36  lr: 0.001262  loss: 3.0855 (3.0855)  time: 5.0414  data: 2.5817  max mem: 30335
[19:04:56.192132] Epoch: [37]  [2000/6672]  eta: 0:58:48  lr: 0.001261  loss: 2.8961 (2.9357)  time: 0.7329  data: 0.0003  max mem: 30335
[19:30:06.737370] Epoch: [37]  [4000/6672]  eta: 0:33:37  lr: 0.001259  loss: 2.8571 (2.9399)  time: 0.8447  data: 0.0003  max mem: 30335
[19:55:24.386182] Epoch: [37]  [6000/6672]  eta: 0:08:28  lr: 0.001257  loss: 2.9417 (2.9427)  time: 0.7263  data: 0.0002  max mem: 30335
[20:03:50.304443] Epoch: [37]  [6671/6672]  eta: 0:00:00  lr: 0.001257  loss: 2.9337 (2.9436)  time: 0.7263  data: 0.0011  max mem: 30335
[20:03:51.072578] Epoch: [37] Total time: 1:24:06 (0.7564 s / it)
[20:03:51.121841] Averaged stats: lr: 0.001257  loss: 2.9337 (2.9418)
[20:03:55.493804] Test:  [   0/2084]  eta: 2:31:39  loss: 0.4974 (0.4974)  acc1: 83.3333 (83.3333)  acc5: 95.8333 (95.8333)  time: 4.3666  data: 3.6402  max mem: 30335
[20:06:04.238317] Test:  [ 500/2084]  eta: 0:07:00  loss: 1.3461 (0.9985)  acc1: 58.3333 (75.2162)  acc5: 95.8333 (93.7791)  time: 0.2574  data: 0.0002  max mem: 30335
[20:08:14.410274] Test:  [1000/2084]  eta: 0:04:45  loss: 0.8960 (1.0432)  acc1: 75.0000 (74.0468)  acc5: 91.6667 (93.4191)  time: 0.2570  data: 0.0002  max mem: 30335
[20:10:23.253863] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9488 (1.1551)  acc1: 75.0000 (71.6550)  acc5: 91.6667 (91.8665)  time: 0.2570  data: 0.0002  max mem: 30335
[20:12:32.368366] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7775 (1.2050)  acc1: 79.1667 (70.5002)  acc5: 95.8333 (91.0316)  time: 0.2576  data: 0.0002  max mem: 30335
[20:12:54.566653] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6614 (1.2039)  acc1: 87.5000 (70.5360)  acc5: 100.0000 (91.0660)  time: 0.2485  data: 0.0002  max mem: 30335
[20:12:54.699532] Test: Total time: 0:09:03 (0.2608 s / it)
[20:13:10.189019] * Acc@1 70.558 Acc@5 91.082 loss 1.203
[20:13:10.189606] Accuracy of the network on the 50000 test images: 70.6%
[20:13:10.189673] Max accuracy: 71.41%
[20:13:10.388338] log_dir: ./output_dir_qkformer
[20:13:15.818440] Epoch: [38]  [   0/6672]  eta: 10:02:07  lr: 0.001257  loss: 2.5293 (2.5293)  time: 5.4148  data: 2.1754  max mem: 30335
[20:38:19.441532] Epoch: [38]  [2000/6672]  eta: 0:58:42  lr: 0.001255  loss: 3.0405 (2.9229)  time: 0.7535  data: 0.0004  max mem: 30335
[21:03:13.671523] Epoch: [38]  [4000/6672]  eta: 0:33:25  lr: 0.001254  loss: 2.8498 (2.9312)  time: 0.7301  data: 0.0002  max mem: 30335
[21:28:21.480032] Epoch: [38]  [6000/6672]  eta: 0:08:25  lr: 0.001252  loss: 2.9589 (2.9335)  time: 0.7280  data: 0.0002  max mem: 30335
[21:36:51.678081] Epoch: [38]  [6671/6672]  eta: 0:00:00  lr: 0.001251  loss: 2.9205 (2.9352)  time: 0.7252  data: 0.0010  max mem: 30335
[21:36:52.493865] Epoch: [38] Total time: 1:23:42 (0.7527 s / it)
[21:36:52.543131] Averaged stats: lr: 0.001251  loss: 2.9205 (2.9341)
[21:36:57.383509] Test:  [   0/2084]  eta: 2:47:57  loss: 0.4805 (0.4805)  acc1: 87.5000 (87.5000)  acc5: 100.0000 (100.0000)  time: 4.8357  data: 3.9607  max mem: 30335
[21:39:05.903667] Test:  [ 500/2084]  eta: 0:07:01  loss: 1.0412 (0.9011)  acc1: 58.3333 (77.1374)  acc5: 95.8333 (94.8603)  time: 0.2572  data: 0.0002  max mem: 30335
[21:41:14.381608] Test:  [1000/2084]  eta: 0:04:43  loss: 0.9141 (0.9539)  acc1: 79.1667 (76.0281)  acc5: 91.6667 (94.2516)  time: 0.2567  data: 0.0002  max mem: 30335
[21:43:22.924951] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9680 (1.0846)  acc1: 75.0000 (73.2317)  acc5: 91.6667 (92.4384)  time: 0.2574  data: 0.0002  max mem: 30335
[21:45:31.408600] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5865 (1.1514)  acc1: 87.5000 (71.8537)  acc5: 95.8333 (91.4730)  time: 0.2563  data: 0.0002  max mem: 30335
[21:45:52.613222] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7884 (1.1527)  acc1: 83.3333 (71.7600)  acc5: 95.8333 (91.4780)  time: 0.2504  data: 0.0002  max mem: 30335
[21:45:52.720119] Test: Total time: 0:09:00 (0.2592 s / it)
[21:46:08.119749] * Acc@1 71.753 Acc@5 91.469 loss 1.153
[21:46:08.119988] Accuracy of the network on the 50000 test images: 71.8%
[21:46:08.120019] Max accuracy: 71.75%
[21:46:08.256745] log_dir: ./output_dir_qkformer
[21:46:14.191017] Epoch: [39]  [   0/6672]  eta: 10:42:45  lr: 0.001251  loss: 3.3747 (3.3747)  time: 5.7803  data: 2.4806  max mem: 30335
[22:11:34.187031] Epoch: [39]  [2000/6672]  eta: 0:59:21  lr: 0.001250  loss: 2.8156 (2.9122)  time: 0.7282  data: 0.0003  max mem: 30335
[22:36:38.494953] Epoch: [39]  [4000/6672]  eta: 0:33:43  lr: 0.001248  loss: 2.7700 (2.9177)  time: 0.7276  data: 0.0003  max mem: 30335
[23:01:45.941244] Epoch: [39]  [6000/6672]  eta: 0:08:28  lr: 0.001246  loss: 2.9920 (2.9279)  time: 0.7275  data: 0.0003  max mem: 30335
[23:10:05.701029] Epoch: [39]  [6671/6672]  eta: 0:00:00  lr: 0.001246  loss: 2.9041 (2.9272)  time: 0.7264  data: 0.0007  max mem: 30335
[23:10:06.488181] Epoch: [39] Total time: 1:23:58 (0.7551 s / it)
[23:10:06.523300] Averaged stats: lr: 0.001246  loss: 2.9041 (2.9262)
[23:10:11.129020] Test:  [   0/2084]  eta: 2:39:49  loss: 0.7934 (0.7934)  acc1: 87.5000 (87.5000)  acc5: 91.6667 (91.6667)  time: 4.6017  data: 3.6387  max mem: 30335
[23:12:19.609237] Test:  [ 500/2084]  eta: 0:07:00  loss: 1.0164 (0.9265)  acc1: 70.8333 (75.8816)  acc5: 95.8333 (94.3779)  time: 0.2565  data: 0.0002  max mem: 30335
[23:14:28.743569] Test:  [1000/2084]  eta: 0:04:43  loss: 1.1210 (0.9751)  acc1: 79.1667 (75.0125)  acc5: 87.5000 (93.8145)  time: 0.2562  data: 0.0002  max mem: 30335
[23:16:39.271507] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9044 (1.1020)  acc1: 79.1667 (72.3240)  acc5: 95.8333 (92.0609)  time: 0.2575  data: 0.0002  max mem: 30335
[23:18:48.169123] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4625 (1.1479)  acc1: 87.5000 (71.3019)  acc5: 100.0000 (91.3377)  time: 0.2573  data: 0.0002  max mem: 30335
[23:19:09.334207] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4962 (1.1523)  acc1: 87.5000 (71.2580)  acc5: 95.8333 (91.2940)  time: 0.2496  data: 0.0001  max mem: 30335
[23:19:09.440501] Test: Total time: 0:09:02 (0.2605 s / it)
[23:19:25.415118] * Acc@1 71.263 Acc@5 91.279 loss 1.152
[23:19:25.415396] Accuracy of the network on the 50000 test images: 71.3%
[23:19:25.415429] Max accuracy: 71.75%
[23:19:25.648039] log_dir: ./output_dir_qkformer
[23:19:33.474736] Epoch: [40]  [   0/6672]  eta: 14:29:48  lr: 0.001246  loss: 2.6884 (2.6884)  time: 7.8220  data: 2.1212  max mem: 30335
[23:44:40.450885] Epoch: [40]  [2000/6672]  eta: 0:58:56  lr: 0.001244  loss: 3.0133 (2.9207)  time: 0.7771  data: 0.0002  max mem: 30335
[00:09:46.089556] Epoch: [40]  [4000/6672]  eta: 0:33:36  lr: 0.001242  loss: 2.8669 (2.9113)  time: 0.8184  data: 0.0002  max mem: 30335
[00:34:45.379255] Epoch: [40]  [6000/6672]  eta: 0:08:26  lr: 0.001240  loss: 2.9041 (2.9176)  time: 0.7280  data: 0.0005  max mem: 30335
[00:43:11.395969] Epoch: [40]  [6671/6672]  eta: 0:00:00  lr: 0.001240  loss: 3.0061 (2.9215)  time: 0.7338  data: 0.0011  max mem: 30335
[00:43:12.215706] Epoch: [40] Total time: 1:23:46 (0.7534 s / it)
[00:43:12.256553] Averaged stats: lr: 0.001240  loss: 3.0061 (2.9221)
[00:43:16.652224] Test:  [   0/2084]  eta: 2:32:13  loss: 0.4571 (0.4571)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.3825  data: 3.6371  max mem: 30335
[00:45:25.041334] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.1936 (0.9290)  acc1: 62.5000 (76.3556)  acc5: 95.8333 (94.6025)  time: 0.2578  data: 0.0002  max mem: 30335
[00:47:33.961205] Test:  [1000/2084]  eta: 0:04:43  loss: 1.2115 (0.9905)  acc1: 70.8333 (75.2081)  acc5: 91.6667 (93.8561)  time: 0.2562  data: 0.0002  max mem: 30335
[00:49:42.351827] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9856 (1.1087)  acc1: 75.0000 (72.6072)  acc5: 91.6667 (92.2857)  time: 0.2557  data: 0.0002  max mem: 30335
[00:51:51.633384] Test:  [2000/2084]  eta: 0:00:21  loss: 0.6634 (1.1639)  acc1: 83.3333 (71.5684)  acc5: 95.8333 (91.3689)  time: 0.2565  data: 0.0002  max mem: 30335
[00:52:12.800428] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6044 (1.1659)  acc1: 83.3333 (71.5380)  acc5: 95.8333 (91.3340)  time: 0.2489  data: 0.0001  max mem: 30335
[00:52:12.934258] Test: Total time: 0:09:00 (0.2594 s / it)
[00:52:28.234237] * Acc@1 71.531 Acc@5 91.333 loss 1.166
[00:52:28.234641] Accuracy of the network on the 50000 test images: 71.5%
[00:52:28.234673] Max accuracy: 71.75%
[00:52:28.559403] log_dir: ./output_dir_qkformer
[00:52:37.447437] Epoch: [41]  [   0/6672]  eta: 16:27:51  lr: 0.001240  loss: 2.5188 (2.5188)  time: 8.8837  data: 2.8343  max mem: 30335
[01:18:01.244727] Epoch: [41]  [2000/6672]  eta: 0:59:37  lr: 0.001238  loss: 2.7841 (2.8879)  time: 0.7498  data: 0.0005  max mem: 30335
[01:42:58.291433] Epoch: [41]  [4000/6672]  eta: 0:33:42  lr: 0.001236  loss: 2.8585 (2.8938)  time: 0.7266  data: 0.0003  max mem: 30335
[02:07:45.475970] Epoch: [41]  [6000/6672]  eta: 0:08:25  lr: 0.001234  loss: 2.8209 (2.9013)  time: 0.7483  data: 0.0003  max mem: 30335
[02:16:07.593343] Epoch: [41]  [6671/6672]  eta: 0:00:00  lr: 0.001234  loss: 2.7006 (2.9048)  time: 0.7253  data: 0.0012  max mem: 30335
[02:16:08.431134] Epoch: [41] Total time: 1:23:39 (0.7524 s / it)
[02:16:08.461724] Averaged stats: lr: 0.001234  loss: 2.7006 (2.9130)
[02:16:12.387337] Test:  [   0/2084]  eta: 2:16:08  loss: 0.3607 (0.3607)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 3.9197  data: 3.3685  max mem: 30335
[02:18:21.159766] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.1061 (0.9268)  acc1: 70.8333 (76.7964)  acc5: 95.8333 (94.6773)  time: 0.2568  data: 0.0002  max mem: 30335
[02:20:29.753188] Test:  [1000/2084]  eta: 0:04:42  loss: 1.0073 (0.9722)  acc1: 79.1667 (75.7076)  acc5: 87.5000 (94.1309)  time: 0.2569  data: 0.0002  max mem: 30335
[02:22:39.716901] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8692 (1.0780)  acc1: 75.0000 (73.5898)  acc5: 91.6667 (92.5883)  time: 0.2571  data: 0.0002  max mem: 30335
[02:24:48.181268] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5821 (1.1365)  acc1: 87.5000 (72.3097)  acc5: 95.8333 (91.6563)  time: 0.2564  data: 0.0002  max mem: 30335
[02:25:09.319621] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5642 (1.1412)  acc1: 83.3333 (72.1800)  acc5: 95.8333 (91.6080)  time: 0.2487  data: 0.0001  max mem: 30335
[02:25:09.452488] Test: Total time: 0:09:00 (0.2596 s / it)
[02:25:24.918606] * Acc@1 72.162 Acc@5 91.592 loss 1.141
[02:25:24.918880] Accuracy of the network on the 50000 test images: 72.2%
[02:25:24.918935] Max accuracy: 72.16%
[02:25:25.360012] log_dir: ./output_dir_qkformer
[02:25:33.549346] Epoch: [42]  [   0/6672]  eta: 14:48:38  lr: 0.001234  loss: 2.2978 (2.2978)  time: 7.9913  data: 1.9840  max mem: 30335
[02:50:49.965660] Epoch: [42]  [2000/6672]  eta: 0:59:19  lr: 0.001232  loss: 2.9400 (2.8937)  time: 0.7261  data: 0.0002  max mem: 30335
[03:16:13.397120] Epoch: [42]  [4000/6672]  eta: 0:33:55  lr: 0.001230  loss: 2.7564 (2.8972)  time: 0.7272  data: 0.0003  max mem: 30335
[03:41:18.477896] Epoch: [42]  [6000/6672]  eta: 0:08:29  lr: 0.001228  loss: 2.8436 (2.9100)  time: 0.7297  data: 0.0002  max mem: 30335
[03:49:39.213185] Epoch: [42]  [6671/6672]  eta: 0:00:00  lr: 0.001228  loss: 2.5843 (2.9120)  time: 0.7238  data: 0.0010  max mem: 30335
[03:49:39.959256] Epoch: [42] Total time: 1:24:14 (0.7576 s / it)
[03:49:39.975599] Averaged stats: lr: 0.001228  loss: 2.5843 (2.9081)
[03:49:44.111306] Test:  [   0/2084]  eta: 2:22:57  loss: 0.5280 (0.5280)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.1160  data: 3.5560  max mem: 30335
[03:51:53.467783] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.8982 (0.8985)  acc1: 70.8333 (76.4055)  acc5: 95.8333 (94.6856)  time: 0.2573  data: 0.0002  max mem: 30335
[03:54:02.313684] Test:  [1000/2084]  eta: 0:04:44  loss: 0.8156 (0.9660)  acc1: 79.1667 (74.9542)  acc5: 91.6667 (94.0018)  time: 0.2565  data: 0.0002  max mem: 30335
[03:56:11.666036] Test:  [1500/2084]  eta: 0:02:32  loss: 1.0788 (1.0818)  acc1: 70.8333 (72.6599)  acc5: 91.6667 (92.3273)  time: 0.2569  data: 0.0002  max mem: 30335
[03:58:19.921631] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5835 (1.1261)  acc1: 87.5000 (72.0015)  acc5: 95.8333 (91.6479)  time: 0.2567  data: 0.0003  max mem: 30335
[03:58:41.101009] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5224 (1.1275)  acc1: 87.5000 (71.9500)  acc5: 95.8333 (91.6260)  time: 0.2490  data: 0.0002  max mem: 30335
[03:58:41.237874] Test: Total time: 0:09:01 (0.2597 s / it)
[03:58:57.110974] * Acc@1 71.964 Acc@5 91.621 loss 1.128
[03:58:57.111528] Accuracy of the network on the 50000 test images: 72.0%
[03:58:57.111585] Max accuracy: 72.16%
[03:58:57.155793] log_dir: ./output_dir_qkformer
[03:59:03.227410] Epoch: [43]  [   0/6672]  eta: 11:15:02  lr: 0.001227  loss: 3.1108 (3.1108)  time: 6.0706  data: 2.2199  max mem: 30335
[04:24:15.769140] Epoch: [43]  [2000/6672]  eta: 0:59:05  lr: 0.001226  loss: 2.8637 (2.8935)  time: 0.7317  data: 0.0002  max mem: 30335
[04:49:21.804358] Epoch: [43]  [4000/6672]  eta: 0:33:39  lr: 0.001224  loss: 2.8737 (2.8965)  time: 0.7249  data: 0.0002  max mem: 30335
[05:14:29.183527] Epoch: [43]  [6000/6672]  eta: 0:08:27  lr: 0.001222  loss: 2.7056 (2.8928)  time: 0.7662  data: 0.0003  max mem: 30335
[05:22:50.031919] Epoch: [43]  [6671/6672]  eta: 0:00:00  lr: 0.001221  loss: 2.8975 (2.8947)  time: 0.7234  data: 0.0010  max mem: 30335
[05:22:50.746877] Epoch: [43] Total time: 1:23:53 (0.7544 s / it)
[05:22:50.764477] Averaged stats: lr: 0.001221  loss: 2.8975 (2.8979)
[05:22:55.036320] Test:  [   0/2084]  eta: 2:28:13  loss: 0.2880 (0.2880)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.2676  data: 3.5365  max mem: 30335
[05:25:03.289584] Test:  [ 500/2084]  eta: 0:06:58  loss: 1.0513 (0.9246)  acc1: 66.6667 (76.0562)  acc5: 95.8333 (94.5609)  time: 0.2565  data: 0.0002  max mem: 30335
[05:27:12.775922] Test:  [1000/2084]  eta: 0:04:43  loss: 1.1032 (0.9707)  acc1: 75.0000 (75.4246)  acc5: 91.6667 (93.7979)  time: 0.2559  data: 0.0004  max mem: 30335
[05:29:21.043441] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8526 (1.0876)  acc1: 79.1667 (72.7182)  acc5: 91.6667 (92.2191)  time: 0.2557  data: 0.0002  max mem: 30335
[05:31:29.288504] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5186 (1.1442)  acc1: 87.5000 (71.4934)  acc5: 95.8333 (91.3377)  time: 0.2565  data: 0.0002  max mem: 30335
[05:31:50.429248] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7595 (1.1421)  acc1: 83.3333 (71.5340)  acc5: 95.8333 (91.3960)  time: 0.2485  data: 0.0002  max mem: 30335
[05:31:50.579182] Test: Total time: 0:08:59 (0.2590 s / it)
[05:32:06.064521] * Acc@1 71.532 Acc@5 91.412 loss 1.142
[05:32:06.064966] Accuracy of the network on the 50000 test images: 71.5%
[05:32:06.064999] Max accuracy: 72.16%
[05:32:06.257200] log_dir: ./output_dir_qkformer
[05:32:09.248226] Epoch: [44]  [   0/6672]  eta: 5:32:29  lr: 0.001221  loss: 2.8463 (2.8463)  time: 2.9900  data: 1.7765  max mem: 30335
[05:57:28.736612] Epoch: [44]  [2000/6672]  eta: 0:59:14  lr: 0.001219  loss: 2.7850 (2.8668)  time: 0.7733  data: 0.0008  max mem: 30335
[06:22:26.338298] Epoch: [44]  [4000/6672]  eta: 0:33:36  lr: 0.001217  loss: 2.8585 (2.8740)  time: 0.7246  data: 0.0003  max mem: 30335
[06:47:20.400973] Epoch: [44]  [6000/6672]  eta: 0:08:25  lr: 0.001215  loss: 2.9266 (2.8821)  time: 0.7360  data: 0.0003  max mem: 30335
[06:55:48.724223] Epoch: [44]  [6671/6672]  eta: 0:00:00  lr: 0.001215  loss: 2.9690 (2.8825)  time: 0.7284  data: 0.0011  max mem: 30335
[06:55:49.554057] Epoch: [44] Total time: 1:23:43 (0.7529 s / it)
[06:55:49.609157] Averaged stats: lr: 0.001215  loss: 2.9690 (2.8903)
[06:55:56.801097] Test:  [   0/2084]  eta: 4:09:27  loss: 0.7285 (0.7285)  acc1: 83.3333 (83.3333)  acc5: 91.6667 (91.6667)  time: 7.1819  data: 2.9070  max mem: 30335
[06:58:05.089264] Test:  [ 500/2084]  eta: 0:07:08  loss: 1.2275 (0.9337)  acc1: 66.6667 (76.7132)  acc5: 95.8333 (94.5692)  time: 0.2570  data: 0.0002  max mem: 30335
[07:00:13.418184] Test:  [1000/2084]  eta: 0:04:45  loss: 0.9529 (0.9554)  acc1: 79.1667 (76.1031)  acc5: 87.5000 (94.3057)  time: 0.2564  data: 0.0002  max mem: 30335
[07:02:22.854881] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8864 (1.0574)  acc1: 83.3333 (73.9340)  acc5: 95.8333 (92.7909)  time: 0.2572  data: 0.0002  max mem: 30335
[07:04:31.370864] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5529 (1.1179)  acc1: 87.5000 (72.4804)  acc5: 95.8333 (91.9790)  time: 0.2564  data: 0.0002  max mem: 30335
[07:04:52.559863] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5697 (1.1199)  acc1: 87.5000 (72.4920)  acc5: 95.8333 (91.9660)  time: 0.2492  data: 0.0001  max mem: 30335
[07:04:52.699754] Test: Total time: 0:09:03 (0.2606 s / it)
[07:05:08.259999] * Acc@1 72.492 Acc@5 91.962 loss 1.120
[07:05:08.260245] Accuracy of the network on the 50000 test images: 72.5%
[07:05:08.260293] Max accuracy: 72.49%
[07:05:08.282343] log_dir: ./output_dir_qkformer
[07:05:16.196047] Epoch: [45]  [   0/6672]  eta: 14:38:33  lr: 0.001215  loss: 3.1916 (3.1916)  time: 7.9006  data: 2.8350  max mem: 30335
[07:30:16.120728] Epoch: [45]  [2000/6672]  eta: 0:58:39  lr: 0.001213  loss: 2.8843 (2.8740)  time: 0.7265  data: 0.0003  max mem: 30335
[07:55:11.242806] Epoch: [45]  [4000/6672]  eta: 0:33:24  lr: 0.001211  loss: 2.9872 (2.8816)  time: 0.8884  data: 0.0003  max mem: 30335
[08:20:08.133225] Epoch: [45]  [6000/6672]  eta: 0:08:23  lr: 0.001209  loss: 2.8302 (2.8879)  time: 0.9688  data: 0.0002  max mem: 30335
[08:28:24.544283] Epoch: [45]  [6671/6672]  eta: 0:00:00  lr: 0.001208  loss: 2.8605 (2.8892)  time: 0.7223  data: 0.0011  max mem: 30335
[08:28:25.168915] Epoch: [45] Total time: 1:23:16 (0.7489 s / it)
[08:28:25.172572] Averaged stats: lr: 0.001208  loss: 2.8605 (2.8877)
[08:28:29.192114] Test:  [   0/2084]  eta: 2:19:26  loss: 0.5381 (0.5381)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.0145  data: 3.1390  max mem: 30335
[08:30:38.681093] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.9657 (0.8669)  acc1: 66.6667 (77.9774)  acc5: 95.8333 (95.0848)  time: 0.2564  data: 0.0002  max mem: 30335
[08:32:47.619563] Test:  [1000/2084]  eta: 0:04:44  loss: 0.8593 (0.9085)  acc1: 83.3333 (76.4694)  acc5: 91.6667 (94.5763)  time: 0.2558  data: 0.0002  max mem: 30335
[08:34:56.127391] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9981 (1.0362)  acc1: 75.0000 (73.7425)  acc5: 91.6667 (92.8270)  time: 0.2563  data: 0.0002  max mem: 30335
[08:37:04.712504] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4312 (1.0988)  acc1: 87.5000 (72.4804)  acc5: 100.0000 (91.8853)  time: 0.2567  data: 0.0002  max mem: 30335
[08:37:25.824764] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4890 (1.1010)  acc1: 83.3333 (72.4700)  acc5: 100.0000 (91.8660)  time: 0.2488  data: 0.0001  max mem: 30335
[08:37:25.948339] Test: Total time: 0:09:00 (0.2595 s / it)
[08:37:41.251113] * Acc@1 72.458 Acc@5 91.868 loss 1.101
[08:37:41.251329] Accuracy of the network on the 50000 test images: 72.5%
[08:37:41.251377] Max accuracy: 72.49%
[08:37:41.400221] log_dir: ./output_dir_qkformer
[08:37:47.426057] Epoch: [46]  [   0/6672]  eta: 11:09:59  lr: 0.001208  loss: 2.6850 (2.6850)  time: 6.0251  data: 1.9522  max mem: 30335
[09:02:47.783771] Epoch: [46]  [2000/6672]  eta: 0:58:36  lr: 0.001206  loss: 2.8953 (2.8604)  time: 0.7396  data: 0.0003  max mem: 30335
[09:28:00.392956] Epoch: [46]  [4000/6672]  eta: 0:33:35  lr: 0.001204  loss: 2.9709 (2.8733)  time: 0.7626  data: 0.0002  max mem: 30335
[09:53:11.878106] Epoch: [46]  [6000/6672]  eta: 0:08:27  lr: 0.001202  loss: 2.7919 (2.8771)  time: 0.7283  data: 0.0002  max mem: 30335
[10:01:39.436036] Epoch: [46]  [6671/6672]  eta: 0:00:00  lr: 0.001201  loss: 3.0996 (2.8790)  time: 0.7245  data: 0.0011  max mem: 30335
[10:01:40.149723] Epoch: [46] Total time: 1:23:58 (0.7552 s / it)
[10:01:40.200085] Averaged stats: lr: 0.001201  loss: 3.0996 (2.8793)
[10:01:44.624132] Test:  [   0/2084]  eta: 2:33:31  loss: 0.7360 (0.7360)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.4199  data: 3.7748  max mem: 30335
[10:03:53.937182] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.8440 (0.8999)  acc1: 75.0000 (78.0772)  acc5: 100.0000 (94.7771)  time: 0.2566  data: 0.0002  max mem: 30335
[10:06:02.591813] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7956 (0.9483)  acc1: 79.1667 (76.5526)  acc5: 91.6667 (94.3556)  time: 0.2566  data: 0.0002  max mem: 30335
[10:08:11.242993] Test:  [1500/2084]  eta: 0:02:32  loss: 1.0667 (1.0637)  acc1: 70.8333 (74.1506)  acc5: 91.6667 (92.8131)  time: 0.2565  data: 0.0002  max mem: 30335
[10:10:19.526486] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4693 (1.1171)  acc1: 87.5000 (72.9427)  acc5: 95.8333 (92.0935)  time: 0.2570  data: 0.0002  max mem: 30335
[10:10:40.686154] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5472 (1.1158)  acc1: 87.5000 (72.9480)  acc5: 100.0000 (92.1080)  time: 0.2497  data: 0.0001  max mem: 30335
[10:10:40.799211] Test: Total time: 0:09:00 (0.2594 s / it)
[10:10:55.854713] * Acc@1 72.956 Acc@5 92.107 loss 1.115
[10:10:55.855203] Accuracy of the network on the 50000 test images: 73.0%
[10:10:55.855275] Max accuracy: 72.96%
[10:10:56.134745] log_dir: ./output_dir_qkformer
[10:11:05.351621] Epoch: [47]  [   0/6672]  eta: 16:42:33  lr: 0.001201  loss: 2.9741 (2.9741)  time: 9.0158  data: 2.5797  max mem: 30335
[10:36:20.159725] Epoch: [47]  [2000/6672]  eta: 0:59:17  lr: 0.001199  loss: 2.9953 (2.8667)  time: 0.7291  data: 0.0003  max mem: 30335
[11:01:41.957852] Epoch: [47]  [4000/6672]  eta: 0:33:53  lr: 0.001197  loss: 2.9607 (2.8774)  time: 0.7268  data: 0.0002  max mem: 30335
[11:27:02.352712] Epoch: [47]  [6000/6672]  eta: 0:08:31  lr: 0.001195  loss: 2.8663 (2.8779)  time: 0.7697  data: 0.0002  max mem: 30335
[11:35:28.394411] Epoch: [47]  [6671/6672]  eta: 0:00:00  lr: 0.001195  loss: 2.8847 (2.8797)  time: 0.7236  data: 0.0006  max mem: 30335
[11:35:29.033417] Epoch: [47] Total time: 1:24:32 (0.7603 s / it)
[11:35:29.110306] Averaged stats: lr: 0.001195  loss: 2.8847 (2.8765)
[11:35:33.957072] Test:  [   0/2084]  eta: 2:43:49  loss: 0.2950 (0.2950)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.7165  data: 3.5413  max mem: 30335
[11:37:43.074623] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.9882 (0.9022)  acc1: 75.0000 (77.8443)  acc5: 95.8333 (94.6607)  time: 0.2559  data: 0.0002  max mem: 30335
[11:39:52.018200] Test:  [1000/2084]  eta: 0:04:44  loss: 1.0605 (0.9508)  acc1: 79.1667 (76.6275)  acc5: 87.5000 (94.3348)  time: 0.2557  data: 0.0002  max mem: 30335
[11:42:00.191533] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7725 (1.0735)  acc1: 83.3333 (73.9618)  acc5: 91.6667 (92.6216)  time: 0.2556  data: 0.0002  max mem: 30335
[11:44:09.735905] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5984 (1.1297)  acc1: 87.5000 (72.7053)  acc5: 95.8333 (91.8687)  time: 0.2555  data: 0.0002  max mem: 30335
[11:44:30.843447] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6059 (1.1318)  acc1: 83.3333 (72.6280)  acc5: 95.8333 (91.8360)  time: 0.2480  data: 0.0001  max mem: 30335
[11:44:30.960695] Test: Total time: 0:09:01 (0.2600 s / it)
[11:44:46.379513] * Acc@1 72.628 Acc@5 91.835 loss 1.132
[11:44:46.379737] Accuracy of the network on the 50000 test images: 72.6%
[11:44:46.379784] Max accuracy: 72.96%
[11:44:46.512119] log_dir: ./output_dir_qkformer
[11:44:54.457872] Epoch: [48]  [   0/6672]  eta: 14:43:16  lr: 0.001195  loss: 2.7914 (2.7914)  time: 7.9431  data: 2.3668  max mem: 30335
[12:09:55.367317] Epoch: [48]  [2000/6672]  eta: 0:58:42  lr: 0.001192  loss: 2.8087 (2.8484)  time: 0.7384  data: 0.0002  max mem: 30335
[12:34:57.219293] Epoch: [48]  [4000/6672]  eta: 0:33:30  lr: 0.001190  loss: 2.7788 (2.8583)  time: 0.7268  data: 0.0003  max mem: 30335
[13:00:02.093262] Epoch: [48]  [6000/6672]  eta: 0:08:25  lr: 0.001188  loss: 2.9741 (2.8632)  time: 0.7425  data: 0.0003  max mem: 30335
[13:08:28.050578] Epoch: [48]  [6671/6672]  eta: 0:00:00  lr: 0.001188  loss: 2.9642 (2.8665)  time: 0.7280  data: 0.0006  max mem: 30335
[13:08:28.953383] Epoch: [48] Total time: 1:23:42 (0.7528 s / it)
[13:08:28.967489] Averaged stats: lr: 0.001188  loss: 2.9642 (2.8686)
[13:08:33.133675] Test:  [   0/2084]  eta: 2:24:32  loss: 0.2547 (0.2547)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.1613  data: 3.6723  max mem: 30335
[13:10:42.634089] Test:  [ 500/2084]  eta: 0:07:02  loss: 1.1787 (0.9083)  acc1: 70.8333 (77.4118)  acc5: 95.8333 (94.9767)  time: 0.2564  data: 0.0002  max mem: 30335
[13:12:52.226926] Test:  [1000/2084]  eta: 0:04:45  loss: 1.0825 (0.9414)  acc1: 79.1667 (77.0230)  acc5: 91.6667 (94.5055)  time: 0.2563  data: 0.0002  max mem: 30335
[13:15:00.573266] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8777 (1.0488)  acc1: 75.0000 (74.3782)  acc5: 95.8333 (93.0879)  time: 0.2566  data: 0.0002  max mem: 30335
[13:17:09.209067] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5361 (1.1048)  acc1: 87.5000 (73.1488)  acc5: 95.8333 (92.3184)  time: 0.2562  data: 0.0002  max mem: 30335
[13:17:30.374153] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7086 (1.1063)  acc1: 83.3333 (73.1520)  acc5: 95.8333 (92.3120)  time: 0.2502  data: 0.0001  max mem: 30335
[13:17:30.478418] Test: Total time: 0:09:01 (0.2598 s / it)
[13:17:45.759617] * Acc@1 73.132 Acc@5 92.330 loss 1.106
[13:17:45.760089] Accuracy of the network on the 50000 test images: 73.1%
[13:17:45.760163] Max accuracy: 73.13%
[13:17:45.932333] log_dir: ./output_dir_qkformer
[13:17:58.948851] Epoch: [49]  [   0/6672]  eta: 23:47:43  lr: 0.001188  loss: 2.1789 (2.1789)  time: 12.8392  data: 2.3229  max mem: 30335
[13:42:54.092954] Epoch: [49]  [2000/6672]  eta: 0:58:40  lr: 0.001185  loss: 2.7987 (2.8614)  time: 0.7316  data: 0.0003  max mem: 30335
[14:08:08.989963] Epoch: [49]  [4000/6672]  eta: 0:33:38  lr: 0.001183  loss: 3.1428 (2.8662)  time: 0.7276  data: 0.0002  max mem: 30335
[14:33:05.828627] Epoch: [49]  [6000/6672]  eta: 0:08:26  lr: 0.001181  loss: 2.8635 (2.8720)  time: 0.7294  data: 0.0003  max mem: 30335
[14:41:39.979475] Epoch: [49]  [6671/6672]  eta: 0:00:00  lr: 0.001180  loss: 2.8806 (2.8722)  time: 0.7267  data: 0.0012  max mem: 30335
[14:41:40.702146] Epoch: [49] Total time: 1:23:54 (0.7546 s / it)
[14:41:40.731155] Averaged stats: lr: 0.001180  loss: 2.8806 (2.8632)
[14:41:44.723058] Test:  [   0/2084]  eta: 2:18:29  loss: 0.4569 (0.4569)  acc1: 83.3333 (83.3333)  acc5: 95.8333 (95.8333)  time: 3.9872  data: 3.3840  max mem: 30335
[14:43:53.219834] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.9299 (0.8635)  acc1: 75.0000 (77.7445)  acc5: 95.8333 (95.3094)  time: 0.2573  data: 0.0002  max mem: 30335
[14:46:03.575777] Test:  [1000/2084]  eta: 0:04:44  loss: 0.9884 (0.9289)  acc1: 79.1667 (76.1655)  acc5: 91.6667 (94.5430)  time: 0.2563  data: 0.0002  max mem: 30335
[14:48:12.270272] Test:  [1500/2084]  eta: 0:02:32  loss: 1.0868 (1.0359)  acc1: 79.1667 (74.2283)  acc5: 91.6667 (93.1129)  time: 0.2687  data: 0.0002  max mem: 30335
[14:50:21.845867] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4900 (1.0986)  acc1: 87.5000 (72.9365)  acc5: 95.8333 (92.2039)  time: 0.2572  data: 0.0002  max mem: 30335
[14:50:43.017347] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5068 (1.1021)  acc1: 87.5000 (72.8480)  acc5: 100.0000 (92.1900)  time: 0.2496  data: 0.0002  max mem: 30335
[14:50:43.118200] Test: Total time: 0:09:02 (0.2603 s / it)
[14:50:57.938069] * Acc@1 72.862 Acc@5 92.193 loss 1.102
[14:50:57.938529] Accuracy of the network on the 50000 test images: 72.9%
[14:50:57.938575] Max accuracy: 73.13%
[14:50:58.141445] log_dir: ./output_dir_qkformer
[14:51:05.230922] Epoch: [50]  [   0/6672]  eta: 13:05:27  lr: 0.001180  loss: 3.0895 (3.0895)  time: 7.0634  data: 2.3801  max mem: 30335
[15:16:05.665124] Epoch: [50]  [2000/6672]  eta: 0:58:38  lr: 0.001178  loss: 2.9221 (2.8120)  time: 0.9657  data: 0.0002  max mem: 30335
[15:41:24.428102] Epoch: [50]  [4000/6672]  eta: 0:33:40  lr: 0.001176  loss: 2.7913 (2.8393)  time: 0.7339  data: 0.0002  max mem: 30335
[16:06:51.534585] Epoch: [50]  [6000/6672]  eta: 0:08:29  lr: 0.001174  loss: 2.7737 (2.8488)  time: 0.7308  data: 0.0002  max mem: 30335
[16:15:22.616068] Epoch: [50]  [6671/6672]  eta: 0:00:00  lr: 0.001173  loss: 2.9378 (2.8513)  time: 0.7250  data: 0.0011  max mem: 30335
[16:15:23.458689] Epoch: [50] Total time: 1:24:25 (0.7592 s / it)
[16:15:23.500592] Averaged stats: lr: 0.001173  loss: 2.9378 (2.8552)
[16:15:27.761535] Test:  [   0/2084]  eta: 2:27:47  loss: 0.3400 (0.3400)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.2549  data: 3.5776  max mem: 30335
[16:17:37.170398] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.8900 (0.8944)  acc1: 75.0000 (77.4451)  acc5: 95.8333 (95.1680)  time: 0.2647  data: 0.0002  max mem: 30335
[16:19:46.948830] Test:  [1000/2084]  eta: 0:04:45  loss: 1.0206 (0.9417)  acc1: 70.8333 (76.1863)  acc5: 91.6667 (94.5346)  time: 0.2571  data: 0.0002  max mem: 30335
[16:21:55.517893] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8718 (1.0408)  acc1: 83.3333 (74.1200)  acc5: 91.6667 (93.1934)  time: 0.2573  data: 0.0002  max mem: 30335
[16:24:05.013603] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5142 (1.0999)  acc1: 87.5000 (72.8761)  acc5: 95.8333 (92.3247)  time: 0.2574  data: 0.0002  max mem: 30335
[16:24:26.170497] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7476 (1.1026)  acc1: 83.3333 (72.8120)  acc5: 95.8333 (92.2960)  time: 0.2491  data: 0.0002  max mem: 30335
[16:24:26.328346] Test: Total time: 0:09:02 (0.2605 s / it)
[16:24:41.891042] * Acc@1 72.838 Acc@5 92.289 loss 1.103
[16:24:41.891315] Accuracy of the network on the 50000 test images: 72.8%
[16:24:41.891362] Max accuracy: 73.13%
[16:24:42.027600] log_dir: ./output_dir_qkformer
[16:24:54.076939] Epoch: [51]  [   0/6672]  eta: 21:57:19  lr: 0.001173  loss: 2.9023 (2.9023)  time: 11.8464  data: 2.4345  max mem: 30335
[16:50:01.816279] Epoch: [51]  [2000/6672]  eta: 0:59:07  lr: 0.001171  loss: 2.7341 (2.8481)  time: 0.7288  data: 0.0003  max mem: 30335
[17:15:05.855199] Epoch: [51]  [4000/6672]  eta: 0:33:38  lr: 0.001169  loss: 2.7540 (2.8600)  time: 0.7945  data: 0.0003  max mem: 30335
[17:40:22.250997] Epoch: [51]  [6000/6672]  eta: 0:08:28  lr: 0.001166  loss: 2.8049 (2.8593)  time: 0.8931  data: 0.0002  max mem: 30335
[17:48:44.195186] Epoch: [51]  [6671/6672]  eta: 0:00:00  lr: 0.001166  loss: 2.6151 (2.8590)  time: 0.7253  data: 0.0011  max mem: 30335
[17:48:44.975357] Epoch: [51] Total time: 1:24:02 (0.7558 s / it)
[17:48:45.030526] Averaged stats: lr: 0.001166  loss: 2.6151 (2.8529)
[17:48:49.424950] Test:  [   0/2084]  eta: 2:32:25  loss: 0.3428 (0.3428)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.3886  data: 3.8315  max mem: 30335
[17:50:57.810263] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.9353 (0.8720)  acc1: 70.8333 (77.6946)  acc5: 95.8333 (94.8686)  time: 0.2568  data: 0.0002  max mem: 30335
[17:53:06.340837] Test:  [1000/2084]  eta: 0:04:42  loss: 1.0862 (0.9089)  acc1: 75.0000 (76.7358)  acc5: 91.6667 (94.5180)  time: 0.2570  data: 0.0002  max mem: 30335
[17:55:15.426112] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8905 (1.0238)  acc1: 79.1667 (74.4087)  acc5: 91.6667 (92.9075)  time: 0.2569  data: 0.0002  max mem: 30335
[17:57:23.857129] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5620 (1.0779)  acc1: 87.5000 (73.3675)  acc5: 95.8333 (92.1935)  time: 0.2565  data: 0.0002  max mem: 30335
[17:57:44.999036] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5533 (1.0778)  acc1: 87.5000 (73.3480)  acc5: 95.8333 (92.2060)  time: 0.2484  data: 0.0002  max mem: 30335
[17:57:45.106310] Test: Total time: 0:09:00 (0.2592 s / it)
[17:58:00.953967] * Acc@1 73.328 Acc@5 92.200 loss 1.078
[17:58:00.954281] Accuracy of the network on the 50000 test images: 73.3%
[17:58:00.954313] Max accuracy: 73.33%
[17:58:01.120047] log_dir: ./output_dir_qkformer
[17:58:07.482502] Epoch: [52]  [   0/6672]  eta: 11:47:24  lr: 0.001166  loss: 2.5461 (2.5461)  time: 6.3616  data: 2.4112  max mem: 30335
[18:23:23.637951] Epoch: [52]  [2000/6672]  eta: 0:59:14  lr: 0.001163  loss: 2.8602 (2.8269)  time: 0.7474  data: 0.0007  max mem: 30335
[18:48:33.110519] Epoch: [52]  [4000/6672]  eta: 0:33:44  lr: 0.001161  loss: 2.8886 (2.8375)  time: 0.7323  data: 0.0002  max mem: 30335
[19:13:30.291950] Epoch: [52]  [6000/6672]  eta: 0:08:27  lr: 0.001159  loss: 2.8493 (2.8421)  time: 0.7279  data: 0.0002  max mem: 30335
[19:21:52.185403] Epoch: [52]  [6671/6672]  eta: 0:00:00  lr: 0.001158  loss: 2.6375 (2.8446)  time: 0.7260  data: 0.0011  max mem: 30335
[19:21:52.997301] Epoch: [52] Total time: 1:23:51 (0.7542 s / it)
[19:21:53.033319] Averaged stats: lr: 0.001158  loss: 2.6375 (2.8416)
[19:21:57.289509] Test:  [   0/2084]  eta: 2:27:40  loss: 0.4864 (0.4864)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.2515  data: 3.6036  max mem: 30335
[19:24:05.516843] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.8188 (0.8301)  acc1: 75.0000 (79.5160)  acc5: 95.8333 (95.3260)  time: 0.2556  data: 0.0002  max mem: 30335
[19:26:15.656556] Test:  [1000/2084]  eta: 0:04:44  loss: 1.0432 (0.8959)  acc1: 83.3333 (77.7264)  acc5: 91.6667 (94.7344)  time: 0.2699  data: 0.0066  max mem: 30335
[19:28:24.478775] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9743 (1.0164)  acc1: 75.0000 (75.0777)  acc5: 91.6667 (93.0852)  time: 0.2556  data: 0.0002  max mem: 30335
[19:30:33.528732] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5740 (1.0730)  acc1: 87.5000 (73.8485)  acc5: 95.8333 (92.2955)  time: 0.2569  data: 0.0002  max mem: 30335
[19:30:54.703111] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4605 (1.0783)  acc1: 87.5000 (73.7540)  acc5: 100.0000 (92.2060)  time: 0.2484  data: 0.0002  max mem: 30335
[19:30:54.855240] Test: Total time: 0:09:01 (0.2600 s / it)
[19:31:10.550181] * Acc@1 73.722 Acc@5 92.228 loss 1.078
[19:31:10.550650] Accuracy of the network on the 50000 test images: 73.7%
[19:31:10.550696] Max accuracy: 73.72%
[19:31:10.633470] log_dir: ./output_dir_qkformer
[19:31:17.646403] Epoch: [53]  [   0/6672]  eta: 12:59:39  lr: 0.001158  loss: 2.2062 (2.2062)  time: 7.0114  data: 4.6589  max mem: 30335
[19:56:41.091617] Epoch: [53]  [2000/6672]  eta: 0:59:32  lr: 0.001156  loss: 2.7804 (2.8306)  time: 0.7287  data: 0.0002  max mem: 30335
[20:21:43.702665] Epoch: [53]  [4000/6672]  eta: 0:33:45  lr: 0.001154  loss: 2.8891 (2.8302)  time: 0.7442  data: 0.0002  max mem: 30335
[20:47:11.209919] Epoch: [53]  [6000/6672]  eta: 0:08:30  lr: 0.001151  loss: 2.5239 (2.8337)  time: 0.8405  data: 0.0003  max mem: 30335
[20:55:41.560648] Epoch: [53]  [6671/6672]  eta: 0:00:00  lr: 0.001151  loss: 2.8381 (2.8367)  time: 0.7293  data: 0.0010  max mem: 30335
[20:55:42.423868] Epoch: [53] Total time: 1:24:31 (0.7602 s / it)
[20:55:42.499724] Averaged stats: lr: 0.001151  loss: 2.8381 (2.8404)
[20:55:47.385697] Test:  [   0/2084]  eta: 2:49:26  loss: 0.4359 (0.4359)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.8782  data: 4.0041  max mem: 30335
[20:57:56.284791] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.8703 (0.9430)  acc1: 75.0000 (77.3453)  acc5: 95.8333 (94.9019)  time: 0.2559  data: 0.0002  max mem: 30335
[21:00:06.022272] Test:  [1000/2084]  eta: 0:04:45  loss: 0.8806 (0.9745)  acc1: 83.3333 (76.3861)  acc5: 91.6667 (94.4389)  time: 0.2567  data: 0.0002  max mem: 30335
[21:02:16.671513] Test:  [1500/2084]  eta: 0:02:33  loss: 0.8425 (1.0666)  acc1: 75.0000 (74.1367)  acc5: 95.8333 (93.0491)  time: 0.2563  data: 0.0002  max mem: 30335
[21:04:25.549536] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5323 (1.1114)  acc1: 87.5000 (73.2092)  acc5: 100.0000 (92.3372)  time: 0.2557  data: 0.0002  max mem: 30335
[21:04:46.717772] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5126 (1.1115)  acc1: 83.3333 (73.1680)  acc5: 100.0000 (92.3600)  time: 0.2492  data: 0.0001  max mem: 30335
[21:04:46.830425] Test: Total time: 0:09:04 (0.2612 s / it)
[21:05:02.327534] * Acc@1 73.158 Acc@5 92.349 loss 1.111
[21:05:02.327825] Accuracy of the network on the 50000 test images: 73.2%
[21:05:02.327875] Max accuracy: 73.72%
[21:05:02.837729] log_dir: ./output_dir_qkformer
[21:05:17.378716] Epoch: [54]  [   0/6672]  eta: 1 day, 2:52:05  lr: 0.001151  loss: 2.8252 (2.8252)  time: 14.4972  data: 4.2039  max mem: 30335
[21:30:30.322685] Epoch: [54]  [2000/6672]  eta: 0:59:25  lr: 0.001148  loss: 2.6652 (2.8340)  time: 0.7929  data: 0.0003  max mem: 30335
[21:55:43.915513] Epoch: [54]  [4000/6672]  eta: 0:33:50  lr: 0.001146  loss: 2.8939 (2.8336)  time: 0.7276  data: 0.0003  max mem: 30335
[22:21:23.531777] Epoch: [54]  [6000/6672]  eta: 0:08:32  lr: 0.001144  loss: 2.9105 (2.8336)  time: 0.7293  data: 0.0002  max mem: 30335
[22:30:00.614759] Epoch: [54]  [6671/6672]  eta: 0:00:00  lr: 0.001143  loss: 2.9527 (2.8341)  time: 0.7226  data: 0.0011  max mem: 30335
[22:30:01.137867] Epoch: [54] Total time: 1:24:58 (0.7641 s / it)
[22:30:01.299969] Averaged stats: lr: 0.001143  loss: 2.9527 (2.8340)
[22:30:06.020962] Test:  [   0/2084]  eta: 2:42:30  loss: 0.6267 (0.6267)  acc1: 75.0000 (75.0000)  acc5: 95.8333 (95.8333)  time: 4.6789  data: 3.7535  max mem: 30335
[22:32:14.250192] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.8484 (0.8670)  acc1: 70.8333 (78.0938)  acc5: 95.8333 (95.5173)  time: 0.2559  data: 0.0002  max mem: 30335
[22:34:23.777853] Test:  [1000/2084]  eta: 0:04:44  loss: 1.0624 (0.9109)  acc1: 75.0000 (77.1062)  acc5: 91.6667 (94.7719)  time: 0.2554  data: 0.0002  max mem: 30335
[22:36:34.736597] Test:  [1500/2084]  eta: 0:02:33  loss: 0.7580 (1.0088)  acc1: 79.1667 (74.9334)  acc5: 95.8333 (93.3711)  time: 0.3223  data: 0.0002  max mem: 30335
[22:38:42.822108] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4485 (1.0582)  acc1: 87.5000 (73.8485)  acc5: 100.0000 (92.6058)  time: 0.2562  data: 0.0002  max mem: 30335
[22:39:03.899468] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6309 (1.0617)  acc1: 83.3333 (73.7580)  acc5: 100.0000 (92.5880)  time: 0.2480  data: 0.0001  max mem: 30335
[22:39:04.016813] Test: Total time: 0:09:02 (0.2604 s / it)
[22:39:19.802299] * Acc@1 73.753 Acc@5 92.603 loss 1.062
[22:39:19.802573] Accuracy of the network on the 50000 test images: 73.8%
[22:39:19.802616] Max accuracy: 73.75%
[22:39:20.213417] log_dir: ./output_dir_qkformer
[22:39:27.369845] Epoch: [55]  [   0/6672]  eta: 13:14:20  lr: 0.001143  loss: 3.3780 (3.3780)  time: 7.1434  data: 3.1790  max mem: 30335
[23:04:53.843319] Epoch: [55]  [2000/6672]  eta: 0:59:39  lr: 0.001140  loss: 2.7619 (2.8152)  time: 0.8230  data: 0.0002  max mem: 30335
[23:29:57.366083] Epoch: [55]  [4000/6672]  eta: 0:33:47  lr: 0.001138  loss: 2.8158 (2.8228)  time: 0.7273  data: 0.0002  max mem: 30335
[23:55:20.296534] Epoch: [55]  [6000/6672]  eta: 0:08:30  lr: 0.001136  loss: 2.8771 (2.8193)  time: 0.8318  data: 0.0002  max mem: 30335
[00:03:47.137364] Epoch: [55]  [6671/6672]  eta: 0:00:00  lr: 0.001135  loss: 2.8498 (2.8208)  time: 0.7272  data: 0.0006  max mem: 30335
[00:03:48.019591] Epoch: [55] Total time: 1:24:27 (0.7596 s / it)
[00:03:48.060535] Averaged stats: lr: 0.001135  loss: 2.8498 (2.8266)
[00:03:52.648471] Test:  [   0/2084]  eta: 2:39:09  loss: 0.7435 (0.7435)  acc1: 87.5000 (87.5000)  acc5: 91.6667 (91.6667)  time: 4.5821  data: 3.7285  max mem: 30335
[00:06:01.456238] Test:  [ 500/2084]  eta: 0:07:01  loss: 1.1672 (0.8845)  acc1: 62.5000 (77.2122)  acc5: 95.8333 (95.0765)  time: 0.2571  data: 0.0002  max mem: 30335
[00:08:09.866989] Test:  [1000/2084]  eta: 0:04:43  loss: 0.8746 (0.9081)  acc1: 79.1667 (76.6109)  acc5: 91.6667 (94.9009)  time: 0.2564  data: 0.0002  max mem: 30335
[00:10:20.145063] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9905 (1.0145)  acc1: 75.0000 (74.3837)  acc5: 95.8333 (93.4100)  time: 0.2636  data: 0.0003  max mem: 30335
[00:12:28.688153] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5785 (1.0724)  acc1: 87.5000 (73.1822)  acc5: 95.8333 (92.5121)  time: 0.2567  data: 0.0003  max mem: 30335
[00:12:49.844537] Test:  [2083/2084]  eta: 0:00:00  loss: 0.9144 (1.0753)  acc1: 83.3333 (73.1640)  acc5: 91.6667 (92.4760)  time: 0.2490  data: 0.0002  max mem: 30335
[00:12:49.975555] Test: Total time: 0:09:01 (0.2600 s / it)
[00:13:05.612374] * Acc@1 73.161 Acc@5 92.466 loss 1.075
[00:13:05.612918] Accuracy of the network on the 50000 test images: 73.2%
[00:13:05.612957] Max accuracy: 73.75%
[00:13:05.690761] log_dir: ./output_dir_qkformer
[00:13:13.582254] Epoch: [56]  [   0/6672]  eta: 12:40:02  lr: 0.001135  loss: 3.3187 (3.3187)  time: 6.8349  data: 3.9057  max mem: 30335
[00:38:15.915076] Epoch: [56]  [2000/6672]  eta: 0:58:42  lr: 0.001132  loss: 2.7529 (2.8156)  time: 0.7281  data: 0.0003  max mem: 30335
[01:03:33.665438] Epoch: [56]  [4000/6672]  eta: 0:33:40  lr: 0.001130  loss: 2.7378 (2.8114)  time: 0.7295  data: 0.0002  max mem: 30335
[01:29:08.058156] Epoch: [56]  [6000/6672]  eta: 0:08:30  lr: 0.001128  loss: 2.8848 (2.8169)  time: 0.7798  data: 0.0002  max mem: 30335
[01:37:42.059676] Epoch: [56]  [6671/6672]  eta: 0:00:00  lr: 0.001127  loss: 2.7562 (2.8202)  time: 0.7707  data: 0.0006  max mem: 30335
[01:37:42.617369] Epoch: [56] Total time: 1:24:36 (0.7609 s / it)
[01:37:42.969924] Averaged stats: lr: 0.001127  loss: 2.7562 (2.8208)
[01:37:47.476147] Test:  [   0/2084]  eta: 2:36:21  loss: 0.2972 (0.2972)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.5017  data: 3.8087  max mem: 30335
[01:39:55.796365] Test:  [ 500/2084]  eta: 0:06:59  loss: 1.1421 (0.8625)  acc1: 62.5000 (77.9774)  acc5: 95.8333 (95.0266)  time: 0.2559  data: 0.0002  max mem: 30335
[01:42:04.235622] Test:  [1000/2084]  eta: 0:04:42  loss: 0.8430 (0.9035)  acc1: 83.3333 (76.8773)  acc5: 91.6667 (94.7344)  time: 0.2570  data: 0.0002  max mem: 30335
[01:44:12.849973] Test:  [1500/2084]  eta: 0:02:31  loss: 1.0177 (1.0084)  acc1: 75.0000 (74.7446)  acc5: 95.8333 (93.3239)  time: 0.2565  data: 0.0002  max mem: 30335
[01:46:21.547227] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2710 (1.0602)  acc1: 91.6667 (73.5362)  acc5: 100.0000 (92.6620)  time: 0.2569  data: 0.0002  max mem: 30335
[01:46:42.700944] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6523 (1.0606)  acc1: 87.5000 (73.5180)  acc5: 95.8333 (92.6600)  time: 0.2486  data: 0.0002  max mem: 30335
[01:46:42.822180] Test: Total time: 0:08:59 (0.2590 s / it)
[01:46:58.509959] * Acc@1 73.511 Acc@5 92.660 loss 1.061
[01:46:58.510424] Accuracy of the network on the 50000 test images: 73.5%
[01:46:58.510459] Max accuracy: 73.75%
[01:46:58.920442] log_dir: ./output_dir_qkformer
[01:47:06.412515] Epoch: [57]  [   0/6672]  eta: 13:44:43  lr: 0.001127  loss: 3.0767 (3.0767)  time: 7.4165  data: 2.3088  max mem: 30335
[02:12:40.548950] Epoch: [57]  [2000/6672]  eta: 0:59:58  lr: 0.001124  loss: 2.7021 (2.8022)  time: 0.7257  data: 0.0002  max mem: 30335
[02:37:46.286323] Epoch: [57]  [4000/6672]  eta: 0:33:54  lr: 0.001122  loss: 2.7872 (2.8066)  time: 0.9336  data: 0.0002  max mem: 30335
[03:02:54.676322] Epoch: [57]  [6000/6672]  eta: 0:08:30  lr: 0.001120  loss: 2.8406 (2.8121)  time: 0.7302  data: 0.0003  max mem: 30335
[03:11:10.853705] Epoch: [57]  [6671/6672]  eta: 0:00:00  lr: 0.001119  loss: 2.7545 (2.8140)  time: 0.7225  data: 0.0006  max mem: 30335
[03:11:11.671720] Epoch: [57] Total time: 1:24:12 (0.7573 s / it)
[03:11:11.712967] Averaged stats: lr: 0.001119  loss: 2.7545 (2.8136)
[03:11:16.236673] Test:  [   0/2084]  eta: 2:36:55  loss: 0.4012 (0.4012)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.5180  data: 3.7016  max mem: 30335
[03:13:25.427288] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.7717 (0.8553)  acc1: 70.8333 (78.0855)  acc5: 100.0000 (95.2595)  time: 0.2558  data: 0.0002  max mem: 30335
[03:15:35.319072] Test:  [1000/2084]  eta: 0:04:45  loss: 0.9034 (0.9002)  acc1: 79.1667 (76.9314)  acc5: 91.6667 (94.8510)  time: 0.2949  data: 0.0002  max mem: 30335
[03:17:44.357505] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7184 (1.0217)  acc1: 83.3333 (74.5947)  acc5: 95.8333 (93.1351)  time: 0.2561  data: 0.0003  max mem: 30335
[03:19:52.663064] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5808 (1.0746)  acc1: 87.5000 (73.4758)  acc5: 95.8333 (92.4392)  time: 0.2567  data: 0.0002  max mem: 30335
[03:20:13.916804] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6341 (1.0783)  acc1: 83.3333 (73.3900)  acc5: 95.8333 (92.4400)  time: 0.2559  data: 0.0001  max mem: 30335
[03:20:14.058350] Test: Total time: 0:09:02 (0.2602 s / it)
[03:20:28.937804] * Acc@1 73.406 Acc@5 92.446 loss 1.078
[03:20:28.938313] Accuracy of the network on the 50000 test images: 73.4%
[03:20:28.938349] Max accuracy: 73.75%
[03:20:29.244920] log_dir: ./output_dir_qkformer
[03:20:39.500760] Epoch: [58]  [   0/6672]  eta: 18:40:14  lr: 0.001119  loss: 2.7482 (2.7482)  time: 10.0741  data: 2.5162  max mem: 30335
[03:45:44.278060] Epoch: [58]  [2000/6672]  eta: 0:58:56  lr: 0.001116  loss: 3.0656 (2.7977)  time: 0.7499  data: 0.0002  max mem: 30335
[04:10:51.418255] Epoch: [58]  [4000/6672]  eta: 0:33:37  lr: 0.001114  loss: 2.7029 (2.8129)  time: 0.7301  data: 0.0002  max mem: 30335
[04:35:55.498416] Epoch: [58]  [6000/6672]  eta: 0:08:26  lr: 0.001111  loss: 2.6664 (2.8113)  time: 0.7404  data: 0.0002  max mem: 30335
[04:44:23.874658] Epoch: [58]  [6671/6672]  eta: 0:00:00  lr: 0.001110  loss: 2.8205 (2.8123)  time: 0.7253  data: 0.0006  max mem: 30335
[04:44:24.609062] Epoch: [58] Total time: 1:23:55 (0.7547 s / it)
[04:44:24.658115] Averaged stats: lr: 0.001110  loss: 2.8205 (2.8121)
[04:44:28.488434] Test:  [   0/2084]  eta: 2:12:53  loss: 0.3330 (0.3330)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 3.8260  data: 3.2426  max mem: 30335
[04:46:37.283203] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.8642 (0.7982)  acc1: 70.8333 (79.4079)  acc5: 95.8333 (95.5173)  time: 0.2575  data: 0.0002  max mem: 30335
[04:48:47.070356] Test:  [1000/2084]  eta: 0:04:44  loss: 1.0096 (0.8639)  acc1: 79.1667 (77.6931)  acc5: 91.6667 (94.9051)  time: 0.2566  data: 0.0002  max mem: 30335
[04:50:55.738853] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9838 (0.9714)  acc1: 79.1667 (75.5274)  acc5: 91.6667 (93.4211)  time: 0.2562  data: 0.0002  max mem: 30335
[04:53:04.182423] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5175 (1.0204)  acc1: 87.5000 (74.3462)  acc5: 100.0000 (92.7953)  time: 0.2575  data: 0.0002  max mem: 30335
[04:53:25.364220] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5648 (1.0245)  acc1: 87.5000 (74.2040)  acc5: 95.8333 (92.7880)  time: 0.2488  data: 0.0002  max mem: 30335
[04:53:25.495667] Test: Total time: 0:09:00 (0.2595 s / it)
[04:53:41.189979] * Acc@1 74.206 Acc@5 92.810 loss 1.025
[04:53:41.190494] Accuracy of the network on the 50000 test images: 74.2%
[04:53:41.190533] Max accuracy: 74.21%
[04:53:41.295058] log_dir: ./output_dir_qkformer
[04:53:51.493817] Epoch: [59]  [   0/6672]  eta: 18:53:48  lr: 0.001110  loss: 3.2971 (3.2971)  time: 10.1961  data: 2.8886  max mem: 30335
[05:18:56.184549] Epoch: [59]  [2000/6672]  eta: 0:58:56  lr: 0.001108  loss: 2.9341 (2.7886)  time: 0.8206  data: 0.0033  max mem: 30335
[05:44:20.812682] Epoch: [59]  [4000/6672]  eta: 0:33:49  lr: 0.001105  loss: 2.7271 (2.7962)  time: 0.7952  data: 0.0003  max mem: 30335
[06:09:34.599833] Epoch: [59]  [6000/6672]  eta: 0:08:29  lr: 0.001103  loss: 2.6891 (2.8012)  time: 0.7292  data: 0.0003  max mem: 30335
[06:17:59.464977] Epoch: [59]  [6671/6672]  eta: 0:00:00  lr: 0.001102  loss: 2.5825 (2.8021)  time: 0.7268  data: 0.0007  max mem: 30335
[06:18:00.119778] Epoch: [59] Total time: 1:24:18 (0.7582 s / it)
[06:18:00.132335] Averaged stats: lr: 0.001102  loss: 2.5825 (2.8054)
[06:18:03.795631] Test:  [   0/2084]  eta: 2:06:55  loss: 0.6129 (0.6129)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 3.6544  data: 2.8961  max mem: 30335
[06:20:12.533602] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.8003 (0.8513)  acc1: 75.0000 (78.6095)  acc5: 95.8333 (95.2345)  time: 0.2571  data: 0.0002  max mem: 30335
[06:22:21.037030] Test:  [1000/2084]  eta: 0:04:42  loss: 1.1063 (0.9008)  acc1: 70.8333 (77.4143)  acc5: 91.6667 (94.7761)  time: 0.2572  data: 0.0002  max mem: 30335
[06:24:32.592483] Test:  [1500/2084]  eta: 0:02:32  loss: 0.9159 (1.0019)  acc1: 79.1667 (75.3636)  acc5: 95.8333 (93.4627)  time: 0.2564  data: 0.0002  max mem: 30335
[06:26:41.445893] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4940 (1.0612)  acc1: 87.5000 (74.0088)  acc5: 100.0000 (92.6537)  time: 0.2578  data: 0.0002  max mem: 30335
[06:27:02.629949] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6362 (1.0599)  acc1: 87.5000 (74.0360)  acc5: 95.8333 (92.6840)  time: 0.2497  data: 0.0001  max mem: 30335
[06:27:02.763331] Test: Total time: 0:09:02 (0.2604 s / it)
[06:27:18.220979] * Acc@1 74.039 Acc@5 92.682 loss 1.060
[06:27:18.221262] Accuracy of the network on the 50000 test images: 74.0%
[06:27:18.221307] Max accuracy: 74.21%
[06:27:18.349494] log_dir: ./output_dir_qkformer
[06:27:25.668705] Epoch: [60]  [   0/6672]  eta: 13:23:24  lr: 0.001102  loss: 2.0095 (2.0095)  time: 7.2249  data: 2.7859  max mem: 30335
[06:52:37.306090] Epoch: [60]  [2000/6672]  eta: 0:59:05  lr: 0.001100  loss: 2.8388 (2.7893)  time: 0.7285  data: 0.0003  max mem: 30335
[07:17:54.136531] Epoch: [60]  [4000/6672]  eta: 0:33:46  lr: 0.001097  loss: 2.6976 (2.8016)  time: 0.7276  data: 0.0003  max mem: 30335
[07:43:00.971221] Epoch: [60]  [6000/6672]  eta: 0:08:28  lr: 0.001094  loss: 2.8288 (2.8065)  time: 1.0798  data: 0.0003  max mem: 30335
[07:51:26.771487] Epoch: [60]  [6671/6672]  eta: 0:00:00  lr: 0.001094  loss: 2.8079 (2.8073)  time: 0.7281  data: 0.0011  max mem: 30335
[07:51:27.603717] Epoch: [60] Total time: 1:24:09 (0.7568 s / it)
[07:51:27.672977] Averaged stats: lr: 0.001094  loss: 2.8079 (2.8005)
[07:51:31.897092] Test:  [   0/2084]  eta: 2:26:32  loss: 0.3625 (0.3625)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.2189  data: 3.6447  max mem: 30335
[07:53:40.815593] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.9655 (0.8789)  acc1: 66.6667 (77.6364)  acc5: 95.8333 (94.8603)  time: 0.2569  data: 0.0002  max mem: 30335
[07:55:49.260294] Test:  [1000/2084]  eta: 0:04:43  loss: 0.9065 (0.9223)  acc1: 79.1667 (76.6317)  acc5: 91.6667 (94.5138)  time: 0.2569  data: 0.0002  max mem: 30335
[07:57:57.870672] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9197 (1.0171)  acc1: 75.0000 (74.4698)  acc5: 91.6667 (93.2129)  time: 0.2568  data: 0.0002  max mem: 30335
[08:00:06.290571] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4703 (1.0655)  acc1: 87.5000 (73.3737)  acc5: 100.0000 (92.5329)  time: 0.2575  data: 0.0002  max mem: 30335
[08:00:27.395848] Test:  [2083/2084]  eta: 0:00:00  loss: 1.1098 (1.0676)  acc1: 75.0000 (73.3180)  acc5: 95.8333 (92.5300)  time: 0.2480  data: 0.0001  max mem: 30335
[08:00:27.506110] Test: Total time: 0:08:59 (0.2590 s / it)
[08:00:43.029876] * Acc@1 73.317 Acc@5 92.531 loss 1.068
[08:00:43.030285] Accuracy of the network on the 50000 test images: 73.3%
[08:00:43.030329] Max accuracy: 74.21%
[08:00:43.129569] log_dir: ./output_dir_qkformer
[08:00:47.107385] Epoch: [61]  [   0/6672]  eta: 7:21:25  lr: 0.001094  loss: 3.1483 (3.1483)  time: 3.9696  data: 2.7584  max mem: 30335
[08:25:52.800296] Epoch: [61]  [2000/6672]  eta: 0:58:44  lr: 0.001091  loss: 2.8305 (2.7928)  time: 0.7284  data: 0.0003  max mem: 30335
[08:51:02.084208] Epoch: [61]  [4000/6672]  eta: 0:33:35  lr: 0.001088  loss: 2.5903 (2.7966)  time: 0.7259  data: 0.0002  max mem: 30335
[09:16:00.210737] Epoch: [61]  [6000/6672]  eta: 0:08:25  lr: 0.001086  loss: 2.5800 (2.8002)  time: 0.7289  data: 0.0003  max mem: 30335
[09:24:17.162855] Epoch: [61]  [6671/6672]  eta: 0:00:00  lr: 0.001085  loss: 2.7371 (2.7993)  time: 0.7236  data: 0.0011  max mem: 30335
[09:24:17.926660] Epoch: [61] Total time: 1:23:34 (0.7516 s / it)
[09:24:17.958615] Averaged stats: lr: 0.001085  loss: 2.7371 (2.7957)
[09:24:21.467187] Test:  [   0/2084]  eta: 2:01:42  loss: 0.5480 (0.5480)  acc1: 83.3333 (83.3333)  acc5: 91.6667 (91.6667)  time: 3.5043  data: 2.7621  max mem: 30335
[09:26:30.031162] Test:  [ 500/2084]  eta: 0:06:57  loss: 0.9238 (0.8167)  acc1: 70.8333 (79.2582)  acc5: 95.8333 (95.3011)  time: 0.2630  data: 0.0006  max mem: 30335
[09:28:38.450703] Test:  [1000/2084]  eta: 0:04:42  loss: 1.1360 (0.8673)  acc1: 70.8333 (77.5932)  acc5: 91.6667 (94.9634)  time: 0.2565  data: 0.0002  max mem: 30335
[09:30:47.982277] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8957 (0.9801)  acc1: 79.1667 (75.1499)  acc5: 95.8333 (93.4599)  time: 0.2569  data: 0.0002  max mem: 30335
[09:32:57.548115] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4718 (1.0332)  acc1: 87.5000 (74.0984)  acc5: 95.8333 (92.7599)  time: 0.2569  data: 0.0002  max mem: 30335
[09:33:18.687675] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7998 (1.0375)  acc1: 83.3333 (73.9820)  acc5: 95.8333 (92.6760)  time: 0.2487  data: 0.0001  max mem: 30335
[09:33:18.811094] Test: Total time: 0:09:00 (0.2595 s / it)
[09:33:34.441361] * Acc@1 73.984 Acc@5 92.669 loss 1.037
[09:33:34.441655] Accuracy of the network on the 50000 test images: 74.0%
[09:33:34.441684] Max accuracy: 74.21%
[09:33:34.632243] log_dir: ./output_dir_qkformer
[09:33:41.134584] Epoch: [62]  [   0/6672]  eta: 12:02:54  lr: 0.001085  loss: 2.7834 (2.7834)  time: 6.5010  data: 2.7729  max mem: 30335
[09:58:51.591065] Epoch: [62]  [2000/6672]  eta: 0:59:01  lr: 0.001082  loss: 2.7278 (2.7884)  time: 0.7278  data: 0.0003  max mem: 30335
[10:23:52.285316] Epoch: [62]  [4000/6672]  eta: 0:33:34  lr: 0.001080  loss: 2.5951 (2.7876)  time: 0.7411  data: 0.0002  max mem: 30335
[10:48:52.395855] Epoch: [62]  [6000/6672]  eta: 0:08:25  lr: 0.001077  loss: 2.8673 (2.7890)  time: 0.7328  data: 0.0002  max mem: 30335
[10:57:11.395224] Epoch: [62]  [6671/6672]  eta: 0:00:00  lr: 0.001076  loss: 2.9672 (2.7909)  time: 0.7250  data: 0.0006  max mem: 30335
[10:57:12.121690] Epoch: [62] Total time: 1:23:37 (0.7520 s / it)
[10:57:12.256467] Averaged stats: lr: 0.001076  loss: 2.9672 (2.7873)
[10:57:16.560513] Test:  [   0/2084]  eta: 2:29:18  loss: 0.3223 (0.3223)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.2987  data: 3.4984  max mem: 30335
[10:59:25.456242] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.9070 (0.8596)  acc1: 75.0000 (78.5096)  acc5: 95.8333 (95.1098)  time: 0.2564  data: 0.0002  max mem: 30335
[11:01:33.883689] Test:  [1000/2084]  eta: 0:04:43  loss: 0.8586 (0.8866)  acc1: 79.1667 (77.5766)  acc5: 91.6667 (94.8884)  time: 0.2570  data: 0.0002  max mem: 30335
[11:03:43.159367] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5894 (0.9913)  acc1: 83.3333 (75.3276)  acc5: 95.8333 (93.5182)  time: 0.2574  data: 0.0002  max mem: 30335
[11:05:52.120427] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3918 (1.0418)  acc1: 87.5000 (74.3274)  acc5: 95.8333 (92.7266)  time: 0.2568  data: 0.0002  max mem: 30335
[11:06:13.358668] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6012 (1.0428)  acc1: 87.5000 (74.3100)  acc5: 95.8333 (92.7280)  time: 0.2483  data: 0.0002  max mem: 30335
[11:06:13.474626] Test: Total time: 0:09:01 (0.2597 s / it)
[11:06:28.710716] * Acc@1 74.322 Acc@5 92.743 loss 1.043
[11:06:28.711014] Accuracy of the network on the 50000 test images: 74.3%
[11:06:28.711051] Max accuracy: 74.32%
[11:06:29.000894] log_dir: ./output_dir_qkformer
[11:06:36.820428] Epoch: [63]  [   0/6672]  eta: 14:17:57  lr: 0.001076  loss: 2.2934 (2.2934)  time: 7.7155  data: 3.0574  max mem: 30335
[11:31:42.342779] Epoch: [63]  [2000/6672]  eta: 0:58:52  lr: 0.001074  loss: 2.7908 (2.7691)  time: 0.9127  data: 0.0002  max mem: 30335
[11:56:44.802771] Epoch: [63]  [4000/6672]  eta: 0:33:33  lr: 0.001071  loss: 2.7023 (2.7735)  time: 0.7282  data: 0.0003  max mem: 30335
[12:21:44.844689] Epoch: [63]  [6000/6672]  eta: 0:08:25  lr: 0.001068  loss: 2.6854 (2.7798)  time: 0.7291  data: 0.0002  max mem: 30335
[12:30:15.191844] Epoch: [63]  [6671/6672]  eta: 0:00:00  lr: 0.001068  loss: 2.7724 (2.7844)  time: 0.7226  data: 0.0010  max mem: 30335
[12:30:16.041688] Epoch: [63] Total time: 1:23:47 (0.7535 s / it)
[12:30:16.088458] Averaged stats: lr: 0.001068  loss: 2.7724 (2.7827)
[12:30:20.875686] Test:  [   0/2084]  eta: 2:46:06  loss: 0.4772 (0.4772)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7822  data: 3.5118  max mem: 30335
[12:32:29.771920] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.9954 (0.8531)  acc1: 70.8333 (79.2582)  acc5: 95.8333 (95.4924)  time: 0.2572  data: 0.0002  max mem: 30335
[12:34:38.341788] Test:  [1000/2084]  eta: 0:04:43  loss: 1.0645 (0.8859)  acc1: 75.0000 (78.3175)  acc5: 87.5000 (95.0508)  time: 0.2575  data: 0.0002  max mem: 30335
[12:36:47.309650] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8094 (0.9831)  acc1: 79.1667 (75.8744)  acc5: 95.8333 (93.7209)  time: 0.2565  data: 0.0002  max mem: 30335
[12:38:56.026874] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4413 (1.0341)  acc1: 91.6667 (74.7376)  acc5: 100.0000 (92.9556)  time: 0.2561  data: 0.0002  max mem: 30335
[12:39:17.205641] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5298 (1.0334)  acc1: 91.6667 (74.7460)  acc5: 100.0000 (93.0100)  time: 0.2504  data: 0.0002  max mem: 30335
[12:39:17.305328] Test: Total time: 0:09:01 (0.2597 s / it)
[12:39:33.153193] * Acc@1 74.739 Acc@5 93.009 loss 1.033
[12:39:33.153562] Accuracy of the network on the 50000 test images: 74.7%
[12:39:33.153593] Max accuracy: 74.74%
[12:39:33.444320] log_dir: ./output_dir_qkformer
[12:39:46.157899] Epoch: [64]  [   0/6672]  eta: 23:30:04  lr: 0.001068  loss: 2.3788 (2.3788)  time: 12.6805  data: 2.5459  max mem: 30335
[13:04:40.279809] Epoch: [64]  [2000/6672]  eta: 0:58:37  lr: 0.001065  loss: 2.6541 (2.7802)  time: 0.7312  data: 0.0002  max mem: 30335
[13:29:31.482750] Epoch: [64]  [4000/6672]  eta: 0:33:21  lr: 0.001062  loss: 2.8295 (2.7823)  time: 0.7261  data: 0.0002  max mem: 30335
[13:54:26.859206] Epoch: [64]  [6000/6672]  eta: 0:08:23  lr: 0.001060  loss: 2.7122 (2.7830)  time: 0.7311  data: 0.0002  max mem: 30335
[14:02:57.668375] Epoch: [64]  [6671/6672]  eta: 0:00:00  lr: 0.001059  loss: 2.6699 (2.7829)  time: 0.7279  data: 0.0006  max mem: 30335
[14:02:58.529896] Epoch: [64] Total time: 1:23:25 (0.7502 s / it)
[14:02:58.571089] Averaged stats: lr: 0.001059  loss: 2.6699 (2.7782)
[14:03:02.807517] Test:  [   0/2084]  eta: 2:26:57  loss: 0.5826 (0.5826)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.2312  data: 3.5205  max mem: 30335
[14:05:11.164976] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.8813 (0.8583)  acc1: 75.0000 (78.0522)  acc5: 95.8333 (95.2595)  time: 0.2570  data: 0.0002  max mem: 30335
[14:07:19.502051] Test:  [1000/2084]  eta: 0:04:42  loss: 1.2126 (0.8899)  acc1: 75.0000 (77.3393)  acc5: 87.5000 (94.8052)  time: 0.2565  data: 0.0002  max mem: 30335
[14:09:27.889020] Test:  [1500/2084]  eta: 0:02:31  loss: 0.7379 (0.9948)  acc1: 83.3333 (75.2249)  acc5: 91.6667 (93.3766)  time: 0.2562  data: 0.0002  max mem: 30335
[14:11:38.307392] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5363 (1.0411)  acc1: 87.5000 (74.3587)  acc5: 100.0000 (92.6953)  time: 0.2568  data: 0.0002  max mem: 30335
[14:11:59.449039] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4609 (1.0397)  acc1: 83.3333 (74.3620)  acc5: 100.0000 (92.7380)  time: 0.2477  data: 0.0001  max mem: 30335
[14:11:59.578616] Test: Total time: 0:09:01 (0.2596 s / it)
[14:12:15.366292] * Acc@1 74.376 Acc@5 92.751 loss 1.040
[14:12:15.366625] Accuracy of the network on the 50000 test images: 74.4%
[14:12:15.366677] Max accuracy: 74.74%
[14:12:15.549979] log_dir: ./output_dir_qkformer
[14:12:19.256407] Epoch: [65]  [   0/6672]  eta: 6:52:04  lr: 0.001059  loss: 2.4380 (2.4380)  time: 3.7057  data: 2.3880  max mem: 30335
[14:37:18.190326] Epoch: [65]  [2000/6672]  eta: 0:58:28  lr: 0.001056  loss: 2.6424 (2.7661)  time: 0.7321  data: 0.0002  max mem: 30335
[15:02:07.292809] Epoch: [65]  [4000/6672]  eta: 0:33:17  lr: 0.001053  loss: 2.5816 (2.7695)  time: 0.7322  data: 0.0010  max mem: 30335
[15:27:08.254047] Epoch: [65]  [6000/6672]  eta: 0:08:23  lr: 0.001051  loss: 2.7132 (2.7750)  time: 0.8417  data: 0.0002  max mem: 30335
[15:35:25.334196] Epoch: [65]  [6671/6672]  eta: 0:00:00  lr: 0.001050  loss: 2.8728 (2.7745)  time: 0.7288  data: 0.0006  max mem: 30335
[15:35:26.116593] Epoch: [65] Total time: 1:23:10 (0.7480 s / it)
[15:35:26.170783] Averaged stats: lr: 0.001050  loss: 2.8728 (2.7727)
[15:35:30.495333] Test:  [   0/2084]  eta: 2:30:00  loss: 0.2496 (0.2496)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.3188  data: 3.5082  max mem: 30335
[15:37:39.205378] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.8828 (0.8793)  acc1: 75.0000 (78.2768)  acc5: 95.8333 (95.2512)  time: 0.2564  data: 0.0002  max mem: 30335
[15:39:47.894012] Test:  [1000/2084]  eta: 0:04:43  loss: 1.0507 (0.8932)  acc1: 70.8333 (77.9138)  acc5: 91.6667 (95.0258)  time: 0.2566  data: 0.0002  max mem: 30335
[15:41:56.297425] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9412 (0.9866)  acc1: 79.1667 (75.7134)  acc5: 95.8333 (93.6986)  time: 0.2567  data: 0.0002  max mem: 30335
[15:44:09.950299] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4206 (1.0336)  acc1: 91.6667 (74.5898)  acc5: 100.0000 (92.9806)  time: 0.2566  data: 0.0002  max mem: 30335
[15:44:31.163725] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4560 (1.0339)  acc1: 87.5000 (74.6000)  acc5: 100.0000 (92.9980)  time: 0.2516  data: 0.0001  max mem: 30335
[15:44:31.285386] Test: Total time: 0:09:05 (0.2616 s / it)
[15:44:46.155188] * Acc@1 74.587 Acc@5 92.989 loss 1.034
[15:44:46.155564] Accuracy of the network on the 50000 test images: 74.6%
[15:44:46.155609] Max accuracy: 74.74%
[15:44:46.271599] log_dir: ./output_dir_qkformer
[15:44:53.889458] Epoch: [66]  [   0/6672]  eta: 13:45:46  lr: 0.001050  loss: 2.3000 (2.3000)  time: 7.4260  data: 1.9132  max mem: 30335
[16:09:49.793519] Epoch: [66]  [2000/6672]  eta: 0:58:29  lr: 0.001047  loss: 2.7165 (2.7607)  time: 0.8748  data: 0.0003  max mem: 30335
[16:34:44.535580] Epoch: [66]  [4000/6672]  eta: 0:33:21  lr: 0.001044  loss: 2.8168 (2.7625)  time: 0.7228  data: 0.0002  max mem: 30335
[16:59:36.428878] Epoch: [66]  [6000/6672]  eta: 0:08:22  lr: 0.001041  loss: 2.5956 (2.7660)  time: 0.7506  data: 0.0002  max mem: 30335
[17:08:05.038699] Epoch: [66]  [6671/6672]  eta: 0:00:00  lr: 0.001041  loss: 2.7059 (2.7666)  time: 0.7242  data: 0.0006  max mem: 30335
[17:08:05.617617] Epoch: [66] Total time: 1:23:19 (0.7493 s / it)
[17:08:05.822577] Averaged stats: lr: 0.001041  loss: 2.7059 (2.7650)
[17:08:10.848286] Test:  [   0/2084]  eta: 2:54:23  loss: 0.2524 (0.2524)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 5.0208  data: 4.3593  max mem: 30335
[17:10:19.456071] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.7852 (0.8362)  acc1: 79.1667 (79.3912)  acc5: 95.8333 (95.3593)  time: 0.2564  data: 0.0002  max mem: 30335
[17:12:27.649431] Test:  [1000/2084]  eta: 0:04:43  loss: 1.0125 (0.8668)  acc1: 75.0000 (78.2176)  acc5: 87.5000 (95.1174)  time: 0.2555  data: 0.0002  max mem: 30335
[17:14:36.131913] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8586 (0.9513)  acc1: 75.0000 (76.1465)  acc5: 95.8333 (94.0540)  time: 0.2566  data: 0.0002  max mem: 30335
[17:16:44.861877] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3668 (1.0049)  acc1: 91.6667 (74.9355)  acc5: 100.0000 (93.3471)  time: 0.2568  data: 0.0002  max mem: 30335
[17:17:05.987163] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6250 (1.0096)  acc1: 87.5000 (74.8360)  acc5: 95.8333 (93.2940)  time: 0.2488  data: 0.0001  max mem: 30335
[17:17:06.141398] Test: Total time: 0:09:00 (0.2593 s / it)
[17:17:21.826610] * Acc@1 74.840 Acc@5 93.300 loss 1.009
[17:17:21.826879] Accuracy of the network on the 50000 test images: 74.8%
[17:17:21.826913] Max accuracy: 74.84%
[17:17:22.107703] log_dir: ./output_dir_qkformer
[17:17:27.731489] Epoch: [67]  [   0/6672]  eta: 10:24:12  lr: 0.001041  loss: 2.4464 (2.4464)  time: 5.6133  data: 2.2752  max mem: 30335
[17:42:35.732976] Epoch: [67]  [2000/6672]  eta: 0:58:53  lr: 0.001038  loss: 2.8640 (2.7483)  time: 0.7289  data: 0.0002  max mem: 30335
[18:07:38.921571] Epoch: [67]  [4000/6672]  eta: 0:33:34  lr: 0.001035  loss: 2.9369 (2.7512)  time: 0.7267  data: 0.0002  max mem: 30335
[18:32:48.214848] Epoch: [67]  [6000/6672]  eta: 0:08:26  lr: 0.001032  loss: 2.8691 (2.7626)  time: 0.7494  data: 0.0003  max mem: 30335
[18:41:10.027362] Epoch: [67]  [6671/6672]  eta: 0:00:00  lr: 0.001031  loss: 2.5425 (2.7648)  time: 0.7701  data: 0.0010  max mem: 30335
[18:41:10.758562] Epoch: [67] Total time: 1:23:48 (0.7537 s / it)
[18:41:10.767046] Averaged stats: lr: 0.001031  loss: 2.5425 (2.7619)
[18:41:14.385058] Test:  [   0/2084]  eta: 2:05:31  loss: 0.3434 (0.3434)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.6141  data: 3.2950  max mem: 30335
[18:43:22.809856] Test:  [ 500/2084]  eta: 0:06:57  loss: 0.9004 (0.8314)  acc1: 70.8333 (78.7508)  acc5: 95.8333 (95.4840)  time: 0.2566  data: 0.0002  max mem: 30335
[18:45:31.204673] Test:  [1000/2084]  eta: 0:04:42  loss: 0.7588 (0.8798)  acc1: 83.3333 (77.4184)  acc5: 91.6667 (94.8843)  time: 0.2558  data: 0.0002  max mem: 30335
[18:47:39.747960] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9025 (0.9841)  acc1: 79.1667 (75.1277)  acc5: 91.6667 (93.4849)  time: 0.2571  data: 0.0002  max mem: 30335
[18:49:48.077682] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4979 (1.0286)  acc1: 91.6667 (74.3566)  acc5: 95.8333 (92.7286)  time: 0.2562  data: 0.0002  max mem: 30335
[18:50:09.245151] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4636 (1.0286)  acc1: 87.5000 (74.3600)  acc5: 95.8333 (92.7300)  time: 0.2499  data: 0.0001  max mem: 30335
[18:50:09.358943] Test: Total time: 0:08:58 (0.2584 s / it)
[18:50:25.353508] * Acc@1 74.351 Acc@5 92.725 loss 1.029
[18:50:25.353813] Accuracy of the network on the 50000 test images: 74.4%
[18:50:25.353873] Max accuracy: 74.84%
[18:50:25.499926] log_dir: ./output_dir_qkformer
[18:50:35.825620] Epoch: [68]  [   0/6672]  eta: 19:08:06  lr: 0.001031  loss: 2.2473 (2.2473)  time: 10.3247  data: 2.4249  max mem: 30335
[19:15:26.666377] Epoch: [68]  [2000/6672]  eta: 0:58:24  lr: 0.001029  loss: 2.6927 (2.7307)  time: 0.7355  data: 0.0005  max mem: 30335
[19:40:18.690804] Epoch: [68]  [4000/6672]  eta: 0:33:18  lr: 0.001026  loss: 2.7327 (2.7450)  time: 0.7250  data: 0.0003  max mem: 30335
[20:05:04.920092] Epoch: [68]  [6000/6672]  eta: 0:08:21  lr: 0.001023  loss: 2.7100 (2.7550)  time: 0.7294  data: 0.0002  max mem: 30335
[20:13:35.450516] Epoch: [68]  [6671/6672]  eta: 0:00:00  lr: 0.001022  loss: 2.5715 (2.7566)  time: 0.7277  data: 0.0011  max mem: 30335
[20:13:36.169163] Epoch: [68] Total time: 1:23:10 (0.7480 s / it)
[20:13:36.213016] Averaged stats: lr: 0.001022  loss: 2.5715 (2.7594)
[20:13:41.841378] Test:  [   0/2084]  eta: 3:15:20  loss: 0.4872 (0.4872)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.6239  data: 5.0339  max mem: 30335
[20:15:50.267050] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.9070 (0.8310)  acc1: 75.0000 (79.1833)  acc5: 95.8333 (95.5922)  time: 0.2565  data: 0.0002  max mem: 30335
[20:17:58.951715] Test:  [1000/2084]  eta: 0:04:44  loss: 0.9331 (0.8914)  acc1: 75.0000 (77.7556)  acc5: 91.6667 (94.9093)  time: 0.2566  data: 0.0003  max mem: 30335
[20:20:07.323914] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8542 (0.9933)  acc1: 79.1667 (75.4830)  acc5: 95.8333 (93.6098)  time: 0.2567  data: 0.0002  max mem: 30335
[20:22:15.805681] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4951 (1.0392)  acc1: 87.5000 (74.6148)  acc5: 95.8333 (92.9244)  time: 0.2569  data: 0.0002  max mem: 30335
[20:22:36.949844] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5793 (1.0405)  acc1: 87.5000 (74.5980)  acc5: 95.8333 (92.9200)  time: 0.2482  data: 0.0001  max mem: 30335
[20:22:37.075190] Test: Total time: 0:09:00 (0.2595 s / it)
[20:22:52.568733] * Acc@1 74.627 Acc@5 92.920 loss 1.040
[20:22:52.569019] Accuracy of the network on the 50000 test images: 74.6%
[20:22:52.569050] Max accuracy: 74.84%
[20:22:52.707314] log_dir: ./output_dir_qkformer
[20:22:57.418710] Epoch: [69]  [   0/6672]  eta: 8:43:44  lr: 0.001022  loss: 2.3981 (2.3981)  time: 4.7100  data: 3.8284  max mem: 30335
[20:48:12.177656] Epoch: [69]  [2000/6672]  eta: 0:59:07  lr: 0.001019  loss: 2.7753 (2.7425)  time: 0.7281  data: 0.0003  max mem: 30335
[21:12:50.527282] Epoch: [69]  [4000/6672]  eta: 0:33:21  lr: 0.001017  loss: 2.8551 (2.7479)  time: 0.7251  data: 0.0003  max mem: 30335
[21:37:45.484603] Epoch: [69]  [6000/6672]  eta: 0:08:23  lr: 0.001014  loss: 2.6767 (2.7536)  time: 0.7329  data: 0.0003  max mem: 30335
[21:46:10.085733] Epoch: [69]  [6671/6672]  eta: 0:00:00  lr: 0.001013  loss: 2.6694 (2.7555)  time: 0.7288  data: 0.0010  max mem: 30335
[21:46:11.016015] Epoch: [69] Total time: 1:23:18 (0.7491 s / it)
[21:46:11.073756] Averaged stats: lr: 0.001013  loss: 2.6694 (2.7508)
[21:46:16.218690] Test:  [   0/2084]  eta: 2:58:27  loss: 0.2836 (0.2836)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 5.1381  data: 4.4591  max mem: 30335
[21:48:24.591488] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.9389 (0.8085)  acc1: 75.0000 (79.9318)  acc5: 95.8333 (95.6421)  time: 0.2563  data: 0.0002  max mem: 30335
[21:50:34.431765] Test:  [1000/2084]  eta: 0:04:45  loss: 1.0228 (0.8536)  acc1: 79.1667 (78.6588)  acc5: 95.8333 (95.3213)  time: 0.2573  data: 0.0002  max mem: 30335
[21:52:43.117280] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7244 (0.9521)  acc1: 79.1667 (76.4546)  acc5: 95.8333 (94.0345)  time: 0.2573  data: 0.0005  max mem: 30335
[21:54:51.667690] Test:  [2000/2084]  eta: 0:00:21  loss: 0.7793 (1.0048)  acc1: 87.5000 (75.1437)  acc5: 95.8333 (93.3408)  time: 0.2567  data: 0.0002  max mem: 30335
[21:55:12.908223] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6053 (1.0074)  acc1: 87.5000 (75.1220)  acc5: 95.8333 (93.3440)  time: 0.2522  data: 0.0001  max mem: 30335
[21:55:13.056898] Test: Total time: 0:09:01 (0.2601 s / it)
[21:55:28.610494] * Acc@1 75.124 Acc@5 93.329 loss 1.007
[21:55:28.610860] Accuracy of the network on the 50000 test images: 75.1%
[21:55:28.610896] Max accuracy: 75.12%
[21:55:28.875368] log_dir: ./output_dir_qkformer
[21:55:37.769401] Epoch: [70]  [   0/6672]  eta: 16:11:49  lr: 0.001013  loss: 2.3442 (2.3442)  time: 8.7395  data: 3.4366  max mem: 30335
[22:20:54.678104] Epoch: [70]  [2000/6672]  eta: 0:59:21  lr: 0.001010  loss: 2.4002 (2.7367)  time: 0.7269  data: 0.0002  max mem: 30335
[22:45:51.575745] Epoch: [70]  [4000/6672]  eta: 0:33:38  lr: 0.001007  loss: 2.6361 (2.7361)  time: 0.7471  data: 0.0003  max mem: 30335
[23:11:09.664265] Epoch: [70]  [6000/6672]  eta: 0:08:28  lr: 0.001004  loss: 2.8299 (2.7381)  time: 0.7247  data: 0.0002  max mem: 30335
[23:19:41.681671] Epoch: [70]  [6671/6672]  eta: 0:00:00  lr: 0.001003  loss: 2.6538 (2.7408)  time: 0.7290  data: 0.0011  max mem: 30335
[23:19:42.491215] Epoch: [70] Total time: 1:24:13 (0.7574 s / it)
[23:19:42.542144] Averaged stats: lr: 0.001003  loss: 2.6538 (2.7437)
[23:19:47.533759] Test:  [   0/2084]  eta: 2:53:13  loss: 0.6918 (0.6918)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.9873  data: 4.4171  max mem: 30335
[23:21:55.631955] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.9836 (0.7900)  acc1: 70.8333 (79.8985)  acc5: 95.8333 (96.0080)  time: 0.2556  data: 0.0002  max mem: 30335
[23:24:05.417030] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7591 (0.8426)  acc1: 83.3333 (78.5423)  acc5: 91.6667 (95.3713)  time: 0.2569  data: 0.0002  max mem: 30335
[23:26:16.175912] Test:  [1500/2084]  eta: 0:02:33  loss: 0.8475 (0.9378)  acc1: 79.1667 (76.5101)  acc5: 95.8333 (94.1400)  time: 0.2568  data: 0.0002  max mem: 30335
[23:28:26.092731] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4602 (0.9867)  acc1: 91.6667 (75.5768)  acc5: 100.0000 (93.4970)  time: 0.2568  data: 0.0002  max mem: 30335
[23:28:47.243589] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6809 (0.9900)  acc1: 83.3333 (75.4640)  acc5: 95.8333 (93.4800)  time: 0.2485  data: 0.0002  max mem: 30335
[23:28:47.380291] Test: Total time: 0:09:04 (0.2614 s / it)
[23:29:02.788323] * Acc@1 75.465 Acc@5 93.483 loss 0.990
[23:29:02.788561] Accuracy of the network on the 50000 test images: 75.5%
[23:29:02.788600] Max accuracy: 75.47%
[23:29:03.107624] log_dir: ./output_dir_qkformer
[23:29:11.604263] Epoch: [71]  [   0/6672]  eta: 15:29:54  lr: 0.001003  loss: 2.9198 (2.9198)  time: 8.3625  data: 2.4150  max mem: 30335
[23:54:13.408060] Epoch: [71]  [2000/6672]  eta: 0:58:45  lr: 0.001000  loss: 2.6338 (2.7114)  time: 0.8905  data: 0.0025  max mem: 30335
[00:19:22.368238] Epoch: [71]  [4000/6672]  eta: 0:33:36  lr: 0.000998  loss: 2.6774 (2.7272)  time: 0.7282  data: 0.0003  max mem: 30335
[00:44:57.308096] Epoch: [71]  [6000/6672]  eta: 0:08:29  lr: 0.000995  loss: 2.8026 (2.7328)  time: 0.8231  data: 0.0002  max mem: 30335
[00:53:22.610778] Epoch: [71]  [6671/6672]  eta: 0:00:00  lr: 0.000994  loss: 2.7541 (2.7365)  time: 0.7243  data: 0.0011  max mem: 30335
[00:53:23.524994] Epoch: [71] Total time: 1:24:20 (0.7585 s / it)
[00:53:23.602552] Averaged stats: lr: 0.000994  loss: 2.7541 (2.7414)
[00:53:28.410131] Test:  [   0/2084]  eta: 2:46:49  loss: 0.2873 (0.2873)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.8029  data: 3.9928  max mem: 30335
[00:55:36.756779] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5992 (0.7797)  acc1: 75.0000 (80.2229)  acc5: 100.0000 (95.9165)  time: 0.2568  data: 0.0002  max mem: 30335
[00:57:46.121771] Test:  [1000/2084]  eta: 0:04:44  loss: 0.9632 (0.8331)  acc1: 79.1667 (78.7629)  acc5: 91.6667 (95.3671)  time: 0.2569  data: 0.0002  max mem: 30335
[00:59:56.688300] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8786 (0.9514)  acc1: 75.0000 (76.1104)  acc5: 91.6667 (93.9179)  time: 0.2562  data: 0.0002  max mem: 30335
[01:02:13.073333] Test:  [2000/2084]  eta: 0:00:22  loss: 0.5645 (1.0050)  acc1: 87.5000 (75.0750)  acc5: 95.8333 (93.2305)  time: 0.2570  data: 0.0002  max mem: 30335
[01:02:34.239529] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4669 (1.0017)  acc1: 87.5000 (75.1120)  acc5: 100.0000 (93.2840)  time: 0.2496  data: 0.0001  max mem: 30335
[01:02:34.384493] Test: Total time: 0:09:10 (0.2643 s / it)
[01:02:50.142786] * Acc@1 75.127 Acc@5 93.300 loss 1.002
[01:02:50.143045] Accuracy of the network on the 50000 test images: 75.1%
[01:02:50.143079] Max accuracy: 75.47%
[01:02:50.343985] log_dir: ./output_dir_qkformer
[01:03:00.565175] Epoch: [72]  [   0/6672]  eta: 18:55:37  lr: 0.000994  loss: 2.8841 (2.8841)  time: 10.2124  data: 3.0789  max mem: 30335
[01:28:01.529833] Epoch: [72]  [2000/6672]  eta: 0:58:47  lr: 0.000991  loss: 2.6210 (2.7004)  time: 0.7539  data: 0.0004  max mem: 30335
[01:53:04.959694] Epoch: [72]  [4000/6672]  eta: 0:33:32  lr: 0.000988  loss: 2.6493 (2.7150)  time: 0.8189  data: 0.0003  max mem: 30335
[02:18:08.072601] Epoch: [72]  [6000/6672]  eta: 0:08:25  lr: 0.000985  loss: 2.6337 (2.7269)  time: 0.7284  data: 0.0003  max mem: 30335
[02:26:40.999704] Epoch: [72]  [6671/6672]  eta: 0:00:00  lr: 0.000984  loss: 2.6910 (2.7283)  time: 0.7209  data: 0.0006  max mem: 30335
[02:26:41.662193] Epoch: [72] Total time: 1:23:51 (0.7541 s / it)
[02:26:41.740364] Averaged stats: lr: 0.000984  loss: 2.6910 (2.7354)
[02:26:45.041210] Test:  [   0/2084]  eta: 1:54:29  loss: 0.4584 (0.4584)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 3.2961  data: 2.5989  max mem: 30335
[02:28:55.645125] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.9629 (0.7893)  acc1: 75.0000 (80.4724)  acc5: 95.8333 (95.8001)  time: 0.2562  data: 0.0002  max mem: 30335
[02:31:03.700906] Test:  [1000/2084]  eta: 0:04:43  loss: 0.9189 (0.8362)  acc1: 79.1667 (79.0251)  acc5: 91.6667 (95.4504)  time: 0.2552  data: 0.0002  max mem: 30335
[02:33:12.565487] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6183 (0.9401)  acc1: 83.3333 (76.5490)  acc5: 95.8333 (94.2288)  time: 0.2563  data: 0.0002  max mem: 30335
[02:35:21.182779] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3939 (0.9901)  acc1: 91.6667 (75.5476)  acc5: 100.0000 (93.4991)  time: 0.2564  data: 0.0002  max mem: 30335
[02:35:42.278835] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5011 (0.9906)  acc1: 87.5000 (75.5040)  acc5: 100.0000 (93.4880)  time: 0.2486  data: 0.0001  max mem: 30335
[02:35:42.415599] Test: Total time: 0:09:00 (0.2594 s / it)
[02:35:57.490831] * Acc@1 75.516 Acc@5 93.491 loss 0.990
[02:35:57.491299] Accuracy of the network on the 50000 test images: 75.5%
[02:35:57.491333] Max accuracy: 75.52%
[02:35:57.699695] log_dir: ./output_dir_qkformer
[02:36:09.061180] Epoch: [73]  [   0/6672]  eta: 20:53:17  lr: 0.000984  loss: 1.8460 (1.8460)  time: 11.2707  data: 2.3200  max mem: 30335
[03:01:04.295927] Epoch: [73]  [2000/6672]  eta: 0:58:36  lr: 0.000981  loss: 2.6452 (2.7132)  time: 0.7553  data: 0.0002  max mem: 30335
[03:25:54.389267] Epoch: [73]  [4000/6672]  eta: 0:33:20  lr: 0.000978  loss: 2.6495 (2.7241)  time: 0.7274  data: 0.0002  max mem: 30335
[03:51:04.626096] Epoch: [73]  [6000/6672]  eta: 0:08:24  lr: 0.000975  loss: 2.6645 (2.7305)  time: 0.7274  data: 0.0002  max mem: 30335
[03:59:39.199078] Epoch: [73]  [6671/6672]  eta: 0:00:00  lr: 0.000974  loss: 2.6202 (2.7313)  time: 0.7244  data: 0.0010  max mem: 30335
[03:59:40.050008] Epoch: [73] Total time: 1:23:42 (0.7528 s / it)
[03:59:40.081316] Averaged stats: lr: 0.000974  loss: 2.6202 (2.7315)
[03:59:44.537766] Test:  [   0/2084]  eta: 2:34:38  loss: 0.6138 (0.6138)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.4523  data: 3.8249  max mem: 30335
[04:01:52.839250] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.9837 (0.8079)  acc1: 75.0000 (79.8985)  acc5: 95.8333 (95.6504)  time: 0.2563  data: 0.0002  max mem: 30335
[04:04:02.045151] Test:  [1000/2084]  eta: 0:04:43  loss: 0.8495 (0.8604)  acc1: 70.8333 (78.4715)  acc5: 91.6667 (95.1132)  time: 0.2559  data: 0.0002  max mem: 30335
[04:06:10.717461] Test:  [1500/2084]  eta: 0:02:31  loss: 0.9300 (0.9461)  acc1: 79.1667 (76.5406)  acc5: 95.8333 (94.0429)  time: 0.2570  data: 0.0002  max mem: 30335
[04:08:19.092984] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5069 (0.9904)  acc1: 87.5000 (75.4935)  acc5: 95.8333 (93.4533)  time: 0.2574  data: 0.0002  max mem: 30335
[04:08:40.258890] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5293 (0.9943)  acc1: 87.5000 (75.3980)  acc5: 100.0000 (93.4000)  time: 0.2493  data: 0.0001  max mem: 30335
[04:08:40.382868] Test: Total time: 0:09:00 (0.2593 s / it)
[04:08:56.130124] * Acc@1 75.407 Acc@5 93.379 loss 0.994
[04:08:56.130452] Accuracy of the network on the 50000 test images: 75.4%
[04:08:56.130501] Max accuracy: 75.52%
[04:08:56.220186] log_dir: ./output_dir_qkformer
[04:09:01.770284] Epoch: [74]  [   0/6672]  eta: 10:00:12  lr: 0.000974  loss: 2.5179 (2.5179)  time: 5.3976  data: 3.0749  max mem: 30335
[04:34:04.889101] Epoch: [74]  [2000/6672]  eta: 0:58:41  lr: 0.000972  loss: 2.7827 (2.7267)  time: 0.7295  data: 0.0002  max mem: 30335
[04:59:10.333627] Epoch: [74]  [4000/6672]  eta: 0:33:32  lr: 0.000969  loss: 2.7125 (2.7261)  time: 0.7413  data: 0.0003  max mem: 30335
[05:24:15.860183] Epoch: [74]  [6000/6672]  eta: 0:08:26  lr: 0.000966  loss: 2.7794 (2.7284)  time: 0.7382  data: 0.0003  max mem: 30335
[05:32:40.638723] Epoch: [74]  [6671/6672]  eta: 0:00:00  lr: 0.000965  loss: 2.6289 (2.7272)  time: 0.7265  data: 0.0006  max mem: 30335
[05:32:41.436502] Epoch: [74] Total time: 1:23:45 (0.7532 s / it)
[05:32:41.451471] Averaged stats: lr: 0.000965  loss: 2.6289 (2.7229)
[05:32:45.905864] Test:  [   0/2084]  eta: 2:34:28  loss: 0.4264 (0.4264)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.4473  data: 3.8009  max mem: 30335
[05:34:54.326888] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.8564 (0.8043)  acc1: 75.0000 (79.5908)  acc5: 100.0000 (95.8500)  time: 0.2571  data: 0.0002  max mem: 30335
[05:37:02.664902] Test:  [1000/2084]  eta: 0:04:42  loss: 0.8891 (0.8486)  acc1: 75.0000 (78.3258)  acc5: 91.6667 (95.4004)  time: 0.2570  data: 0.0002  max mem: 30335
[05:39:12.473358] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7706 (0.9465)  acc1: 79.1667 (76.0687)  acc5: 95.8333 (94.0595)  time: 0.2567  data: 0.0002  max mem: 30335
[05:41:20.749770] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4013 (0.9932)  acc1: 87.5000 (75.0229)  acc5: 100.0000 (93.3971)  time: 0.2560  data: 0.0002  max mem: 30335
[05:41:41.906582] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3643 (0.9933)  acc1: 91.6667 (74.9920)  acc5: 100.0000 (93.3920)  time: 0.2492  data: 0.0001  max mem: 30335
[05:41:42.041733] Test: Total time: 0:09:00 (0.2594 s / it)
[05:41:57.222396] * Acc@1 74.994 Acc@5 93.388 loss 0.993
[05:41:57.222618] Accuracy of the network on the 50000 test images: 75.0%
[05:41:57.222651] Max accuracy: 75.52%
[05:41:57.323293] log_dir: ./output_dir_qkformer
[05:42:05.673238] Epoch: [75]  [   0/6672]  eta: 15:19:32  lr: 0.000965  loss: 3.3670 (3.3670)  time: 8.2692  data: 2.1264  max mem: 30335
[06:07:05.116326] Epoch: [75]  [2000/6672]  eta: 0:58:39  lr: 0.000962  loss: 2.7005 (2.7019)  time: 0.7287  data: 0.0002  max mem: 30335
[06:31:48.859248] Epoch: [75]  [4000/6672]  eta: 0:33:17  lr: 0.000959  loss: 2.6162 (2.7065)  time: 0.7654  data: 0.0003  max mem: 30335
[06:57:04.399150] Epoch: [75]  [6000/6672]  eta: 0:08:24  lr: 0.000956  loss: 2.5888 (2.7168)  time: 0.7280  data: 0.0002  max mem: 30335
[07:05:25.634916] Epoch: [75]  [6671/6672]  eta: 0:00:00  lr: 0.000955  loss: 2.5215 (2.7149)  time: 0.7224  data: 0.0010  max mem: 30335
[07:05:26.330300] Epoch: [75] Total time: 1:23:29 (0.7508 s / it)
[07:05:26.475684] Averaged stats: lr: 0.000955  loss: 2.5215 (2.7179)
[07:05:30.454213] Test:  [   0/2084]  eta: 2:18:01  loss: 0.4863 (0.4863)  acc1: 87.5000 (87.5000)  acc5: 100.0000 (100.0000)  time: 3.9739  data: 3.2416  max mem: 30335
[07:07:38.515254] Test:  [ 500/2084]  eta: 0:06:57  loss: 1.0983 (0.8203)  acc1: 70.8333 (79.9069)  acc5: 95.8333 (95.7169)  time: 0.2567  data: 0.0002  max mem: 30335
[07:09:46.703226] Test:  [1000/2084]  eta: 0:04:41  loss: 0.8838 (0.8614)  acc1: 75.0000 (78.4799)  acc5: 91.6667 (95.1673)  time: 0.2559  data: 0.0002  max mem: 30335
[07:11:54.925721] Test:  [1500/2084]  eta: 0:02:31  loss: 0.7933 (0.9418)  acc1: 83.3333 (76.6378)  acc5: 95.8333 (93.9679)  time: 0.2650  data: 0.0002  max mem: 30335
[07:14:03.950596] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5236 (0.9891)  acc1: 83.3333 (75.4602)  acc5: 95.8333 (93.3179)  time: 0.2563  data: 0.0002  max mem: 30335
[07:14:25.043635] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3256 (0.9888)  acc1: 91.6667 (75.4740)  acc5: 100.0000 (93.3580)  time: 0.2487  data: 0.0002  max mem: 30335
[07:14:25.177585] Test: Total time: 0:08:58 (0.2585 s / it)
[07:14:41.043171] * Acc@1 75.470 Acc@5 93.353 loss 0.989
[07:14:41.043480] Accuracy of the network on the 50000 test images: 75.5%
[07:14:41.043521] Max accuracy: 75.52%
[07:14:41.227658] log_dir: ./output_dir_qkformer
[07:14:45.491882] Epoch: [76]  [   0/6672]  eta: 7:52:54  lr: 0.000955  loss: 2.4367 (2.4367)  time: 4.2528  data: 3.0436  max mem: 30335
[07:39:50.886186] Epoch: [76]  [2000/6672]  eta: 0:58:44  lr: 0.000952  loss: 2.6746 (2.7089)  time: 0.7278  data: 0.0003  max mem: 30335
[08:04:48.823625] Epoch: [76]  [4000/6672]  eta: 0:33:28  lr: 0.000949  loss: 2.6012 (2.7115)  time: 0.7253  data: 0.0002  max mem: 30335
[08:29:49.469297] Epoch: [76]  [6000/6672]  eta: 0:08:24  lr: 0.000946  loss: 2.6577 (2.7194)  time: 0.7437  data: 0.0003  max mem: 30335
[08:38:17.081942] Epoch: [76]  [6671/6672]  eta: 0:00:00  lr: 0.000945  loss: 2.6680 (2.7167)  time: 0.7216  data: 0.0011  max mem: 30335
[08:38:17.968689] Epoch: [76] Total time: 1:23:36 (0.7519 s / it)
[08:38:18.020592] Averaged stats: lr: 0.000945  loss: 2.6680 (2.7109)
[08:38:22.909321] Test:  [   0/2084]  eta: 2:49:23  loss: 0.2557 (0.2557)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.8768  data: 4.0876  max mem: 30335
[08:40:31.048223] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.8317 (0.7611)  acc1: 75.0000 (80.2395)  acc5: 100.0000 (95.8666)  time: 0.2556  data: 0.0002  max mem: 30335
[08:42:39.711424] Test:  [1000/2084]  eta: 0:04:43  loss: 1.1097 (0.8279)  acc1: 75.0000 (78.4257)  acc5: 87.5000 (95.2214)  time: 0.2567  data: 0.0002  max mem: 30335
[08:44:47.970917] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8886 (0.9106)  acc1: 79.1667 (76.8710)  acc5: 95.8333 (94.2622)  time: 0.2566  data: 0.0002  max mem: 30335
[08:46:56.134034] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4315 (0.9548)  acc1: 91.6667 (75.8558)  acc5: 100.0000 (93.6636)  time: 0.2565  data: 0.0002  max mem: 30335
[08:47:17.270590] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5603 (0.9566)  acc1: 87.5000 (75.8280)  acc5: 100.0000 (93.6640)  time: 0.2498  data: 0.0002  max mem: 30335
[08:47:17.414736] Test: Total time: 0:08:59 (0.2588 s / it)
[08:47:33.112855] * Acc@1 75.853 Acc@5 93.684 loss 0.956
[08:47:33.113332] Accuracy of the network on the 50000 test images: 75.9%
[08:47:33.113371] Max accuracy: 75.85%
[08:47:33.236039] log_dir: ./output_dir_qkformer
[08:47:38.483426] Epoch: [77]  [   0/6672]  eta: 9:43:20  lr: 0.000945  loss: 2.3257 (2.3257)  time: 5.2458  data: 2.8169  max mem: 30335
[09:12:51.761075] Epoch: [77]  [2000/6672]  eta: 0:59:04  lr: 0.000942  loss: 2.9050 (2.7075)  time: 0.7463  data: 0.0003  max mem: 30335
[09:37:49.683502] Epoch: [77]  [4000/6672]  eta: 0:33:34  lr: 0.000939  loss: 2.6247 (2.7076)  time: 0.7257  data: 0.0003  max mem: 30335
[10:02:50.867734] Epoch: [77]  [6000/6672]  eta: 0:08:25  lr: 0.000936  loss: 2.6725 (2.7112)  time: 0.7540  data: 0.0003  max mem: 30335
[10:11:10.801635] Epoch: [77]  [6671/6672]  eta: 0:00:00  lr: 0.000935  loss: 2.6072 (2.7121)  time: 0.7265  data: 0.0010  max mem: 30335
[10:11:11.577305] Epoch: [77] Total time: 1:23:38 (0.7521 s / it)
[10:11:11.616757] Averaged stats: lr: 0.000935  loss: 2.6072 (2.7096)
[10:11:16.245891] Test:  [   0/2084]  eta: 2:40:37  loss: 0.2524 (0.2524)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.6245  data: 3.9044  max mem: 30335
[10:13:24.435967] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.8357 (0.8249)  acc1: 79.1667 (79.4494)  acc5: 95.8333 (95.7834)  time: 0.2564  data: 0.0002  max mem: 30335
[10:15:33.135902] Test:  [1000/2084]  eta: 0:04:43  loss: 0.8636 (0.8549)  acc1: 79.1667 (78.6672)  acc5: 91.6667 (95.4629)  time: 0.2799  data: 0.0002  max mem: 30335
[10:17:42.152665] Test:  [1500/2084]  eta: 0:02:31  loss: 0.7521 (0.9359)  acc1: 79.1667 (76.7766)  acc5: 95.8333 (94.3066)  time: 0.2555  data: 0.0002  max mem: 30335
[10:19:50.756845] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4459 (0.9771)  acc1: 91.6667 (76.0162)  acc5: 95.8333 (93.6907)  time: 0.2557  data: 0.0002  max mem: 30335
[10:20:11.821458] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5706 (0.9812)  acc1: 87.5000 (75.9440)  acc5: 95.8333 (93.6620)  time: 0.2480  data: 0.0001  max mem: 30335
[10:20:11.950342] Test: Total time: 0:09:00 (0.2593 s / it)
[10:20:27.381369] * Acc@1 75.927 Acc@5 93.676 loss 0.981
[10:20:27.381819] Accuracy of the network on the 50000 test images: 75.9%
[10:20:27.381849] Max accuracy: 75.93%
[10:20:27.828777] log_dir: ./output_dir_qkformer
[10:20:31.936107] Epoch: [78]  [   0/6672]  eta: 7:36:37  lr: 0.000935  loss: 2.7256 (2.7256)  time: 4.1064  data: 2.0000  max mem: 30335
[10:45:31.799956] Epoch: [78]  [2000/6672]  eta: 0:58:30  lr: 0.000932  loss: 2.8321 (2.6931)  time: 0.7269  data: 0.0002  max mem: 30335
[11:10:44.578856] Epoch: [78]  [4000/6672]  eta: 0:33:34  lr: 0.000929  loss: 2.5423 (2.6999)  time: 0.7249  data: 0.0002  max mem: 30335
[11:35:56.707460] Epoch: [78]  [6000/6672]  eta: 0:08:27  lr: 0.000926  loss: 2.5287 (2.7013)  time: 0.8044  data: 0.0003  max mem: 30335
[11:44:23.332611] Epoch: [78]  [6671/6672]  eta: 0:00:00  lr: 0.000925  loss: 2.7287 (2.7021)  time: 0.7220  data: 0.0010  max mem: 30335
[11:44:24.261153] Epoch: [78] Total time: 1:23:56 (0.7549 s / it)
[11:44:24.273593] Averaged stats: lr: 0.000925  loss: 2.7287 (2.7035)
[11:44:28.602455] Test:  [   0/2084]  eta: 2:29:11  loss: 0.2915 (0.2915)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.2952  data: 3.4383  max mem: 30335
[11:46:37.618922] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.8246 (0.7764)  acc1: 79.1667 (80.8466)  acc5: 95.8333 (95.9914)  time: 0.2556  data: 0.0002  max mem: 30335
[11:48:46.369984] Test:  [1000/2084]  eta: 0:04:43  loss: 0.7319 (0.8337)  acc1: 83.3333 (79.1708)  acc5: 95.8333 (95.2714)  time: 0.2554  data: 0.0002  max mem: 30335
[11:50:55.966809] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8019 (0.9297)  acc1: 79.1667 (76.8321)  acc5: 91.6667 (94.0956)  time: 0.3038  data: 0.0002  max mem: 30335
[11:53:04.385007] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3807 (0.9672)  acc1: 91.6667 (76.0682)  acc5: 100.0000 (93.5553)  time: 0.2560  data: 0.0002  max mem: 30335
[11:53:25.504716] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5745 (0.9704)  acc1: 87.5000 (75.9720)  acc5: 100.0000 (93.5160)  time: 0.2483  data: 0.0001  max mem: 30335
[11:53:25.623918] Test: Total time: 0:09:01 (0.2598 s / it)
[11:53:41.106900] * Acc@1 75.973 Acc@5 93.520 loss 0.971
[11:53:41.107432] Accuracy of the network on the 50000 test images: 76.0%
[11:53:41.107495] Max accuracy: 75.97%
[11:53:41.194550] log_dir: ./output_dir_qkformer
[11:53:49.968940] Epoch: [79]  [   0/6672]  eta: 16:14:02  lr: 0.000925  loss: 2.0676 (2.0676)  time: 8.7594  data: 3.7702  max mem: 30335
[12:18:59.519425] Epoch: [79]  [2000/6672]  eta: 0:59:04  lr: 0.000922  loss: 2.6247 (2.6776)  time: 0.7512  data: 0.0003  max mem: 30335
[12:44:02.501973] Epoch: [79]  [4000/6672]  eta: 0:33:37  lr: 0.000919  loss: 2.7015 (2.6953)  time: 0.7408  data: 0.0002  max mem: 30335
[13:09:20.653160] Epoch: [79]  [6000/6672]  eta: 0:08:28  lr: 0.000916  loss: 2.8011 (2.6971)  time: 0.7277  data: 0.0002  max mem: 30335
[13:17:43.912773] Epoch: [79]  [6671/6672]  eta: 0:00:00  lr: 0.000915  loss: 2.7047 (2.6977)  time: 0.7241  data: 0.0010  max mem: 30335
[13:17:44.843141] Epoch: [79] Total time: 1:24:03 (0.7559 s / it)
[13:17:44.914743] Averaged stats: lr: 0.000915  loss: 2.7047 (2.6985)
[13:17:48.925364] Test:  [   0/2084]  eta: 2:19:03  loss: 0.4964 (0.4964)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.0038  data: 3.3830  max mem: 30335
[13:19:57.200884] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.7698 (0.7833)  acc1: 75.0000 (79.9983)  acc5: 95.8333 (95.8250)  time: 0.2573  data: 0.0002  max mem: 30335
[13:22:05.551651] Test:  [1000/2084]  eta: 0:04:42  loss: 0.8875 (0.8278)  acc1: 79.1667 (78.8586)  acc5: 91.6667 (95.4212)  time: 0.2566  data: 0.0002  max mem: 30335
[13:24:15.364779] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8791 (0.9318)  acc1: 79.1667 (76.6683)  acc5: 95.8333 (94.1372)  time: 0.2560  data: 0.0002  max mem: 30335
[13:26:24.703492] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4416 (0.9770)  acc1: 87.5000 (75.6309)  acc5: 100.0000 (93.4533)  time: 0.2568  data: 0.0002  max mem: 30335
[13:26:46.453412] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5377 (0.9782)  acc1: 83.3333 (75.6060)  acc5: 100.0000 (93.4560)  time: 0.2503  data: 0.0002  max mem: 30335
[13:26:46.559854] Test: Total time: 0:09:01 (0.2599 s / it)
[13:27:01.947966] * Acc@1 75.607 Acc@5 93.446 loss 0.978
[13:27:01.948423] Accuracy of the network on the 50000 test images: 75.6%
[13:27:01.948511] Max accuracy: 75.97%
[13:27:02.183603] log_dir: ./output_dir_qkformer
[13:27:11.771386] Epoch: [80]  [   0/6672]  eta: 17:43:54  lr: 0.000915  loss: 2.3697 (2.3697)  time: 9.5676  data: 2.9714  max mem: 30335
[13:52:19.729521] Epoch: [80]  [2000/6672]  eta: 0:59:02  lr: 0.000912  loss: 2.6099 (2.6590)  time: 0.7324  data: 0.0002  max mem: 30335
[14:17:23.971428] Epoch: [80]  [4000/6672]  eta: 0:33:37  lr: 0.000909  loss: 2.7298 (2.6725)  time: 0.7265  data: 0.0002  max mem: 30335
[14:42:45.792312] Epoch: [80]  [6000/6672]  eta: 0:08:28  lr: 0.000906  loss: 2.6494 (2.6791)  time: 0.7294  data: 0.0002  max mem: 30335
[14:51:10.564253] Epoch: [80]  [6671/6672]  eta: 0:00:00  lr: 0.000904  loss: 2.3974 (2.6772)  time: 0.7241  data: 0.0005  max mem: 30335
[14:51:11.323109] Epoch: [80] Total time: 1:24:09 (0.7568 s / it)
[14:51:11.399174] Averaged stats: lr: 0.000904  loss: 2.3974 (2.6909)
[14:51:15.675890] Test:  [   0/2084]  eta: 2:28:19  loss: 0.3355 (0.3355)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.2705  data: 3.6170  max mem: 30335
[14:53:24.637812] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.7939 (0.7357)  acc1: 83.3333 (81.0795)  acc5: 95.8333 (96.2325)  time: 0.2560  data: 0.0002  max mem: 30335
[14:55:33.143057] Test:  [1000/2084]  eta: 0:04:43  loss: 0.7122 (0.7761)  acc1: 83.3333 (79.9867)  acc5: 91.6667 (95.7542)  time: 0.2571  data: 0.0002  max mem: 30335
[14:57:41.547411] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6827 (0.8689)  acc1: 83.3333 (77.9758)  acc5: 95.8333 (94.6092)  time: 0.2562  data: 0.0002  max mem: 30335
[14:59:50.595812] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4246 (0.9202)  acc1: 91.6667 (76.7575)  acc5: 100.0000 (93.9114)  time: 0.2567  data: 0.0002  max mem: 30335
[15:00:11.736870] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5180 (0.9253)  acc1: 87.5000 (76.6840)  acc5: 95.8333 (93.8640)  time: 0.2488  data: 0.0002  max mem: 30335
[15:00:11.873878] Test: Total time: 0:09:00 (0.2593 s / it)
[15:00:27.179289] * Acc@1 76.694 Acc@5 93.880 loss 0.925
[15:00:27.179578] Accuracy of the network on the 50000 test images: 76.7%
[15:00:27.179612] Max accuracy: 76.69%
[15:00:27.286575] log_dir: ./output_dir_qkformer
[15:00:36.217796] Epoch: [81]  [   0/6672]  eta: 16:18:12  lr: 0.000904  loss: 2.5760 (2.5760)  time: 8.7969  data: 2.1772  max mem: 30335
[15:25:45.697003] Epoch: [81]  [2000/6672]  eta: 0:59:04  lr: 0.000901  loss: 2.6835 (2.6809)  time: 0.7291  data: 0.0002  max mem: 30335
[15:50:49.996796] Epoch: [81]  [4000/6672]  eta: 0:33:38  lr: 0.000898  loss: 2.5938 (2.6887)  time: 0.7265  data: 0.0003  max mem: 30335
[16:15:55.303066] Epoch: [81]  [6000/6672]  eta: 0:08:26  lr: 0.000895  loss: 2.7495 (2.6908)  time: 0.7584  data: 0.0003  max mem: 30335
[16:24:30.549553] Epoch: [81]  [6671/6672]  eta: 0:00:00  lr: 0.000894  loss: 2.5600 (2.6944)  time: 0.7330  data: 0.0006  max mem: 30335
[16:24:31.262364] Epoch: [81] Total time: 1:24:03 (0.7560 s / it)
[16:24:31.309381] Averaged stats: lr: 0.000894  loss: 2.5600 (2.6874)
[16:24:35.299802] Test:  [   0/2084]  eta: 2:18:26  loss: 0.6319 (0.6319)  acc1: 83.3333 (83.3333)  acc5: 95.8333 (95.8333)  time: 3.9860  data: 3.4387  max mem: 30335
[16:26:43.714154] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.8279 (0.7525)  acc1: 79.1667 (80.7136)  acc5: 95.8333 (95.9830)  time: 0.2565  data: 0.0002  max mem: 30335
[16:28:52.587281] Test:  [1000/2084]  eta: 0:04:42  loss: 0.8795 (0.8009)  acc1: 79.1667 (79.5496)  acc5: 91.6667 (95.6335)  time: 0.2566  data: 0.0002  max mem: 30335
[16:31:01.006762] Test:  [1500/2084]  eta: 0:02:31  loss: 0.8910 (0.8941)  acc1: 79.1667 (77.4761)  acc5: 95.8333 (94.5342)  time: 0.2572  data: 0.0002  max mem: 30335
[16:33:09.396791] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3314 (0.9433)  acc1: 91.6667 (76.4701)  acc5: 95.8333 (93.8510)  time: 0.2573  data: 0.0002  max mem: 30335
[16:33:30.571023] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5026 (0.9488)  acc1: 87.5000 (76.3560)  acc5: 95.8333 (93.8320)  time: 0.2491  data: 0.0001  max mem: 30335
[16:33:30.680245] Test: Total time: 0:08:59 (0.2588 s / it)
[16:33:46.086522] * Acc@1 76.338 Acc@5 93.845 loss 0.949
[16:33:46.086809] Accuracy of the network on the 50000 test images: 76.3%
[16:33:46.086850] Max accuracy: 76.69%
[16:33:46.184857] log_dir: ./output_dir_qkformer
[16:33:49.696232] Epoch: [82]  [   0/6672]  eta: 6:22:50  lr: 0.000894  loss: 3.0884 (3.0884)  time: 3.4428  data: 1.7888  max mem: 30335
[16:58:29.093054] Epoch: [82]  [2000/6672]  eta: 0:57:41  lr: 0.000891  loss: 2.4460 (2.6450)  time: 0.7323  data: 0.0002  max mem: 30335
[17:23:05.982700] Epoch: [82]  [4000/6672]  eta: 0:32:56  lr: 0.000888  loss: 2.7504 (2.6602)  time: 0.7268  data: 0.0003  max mem: 30335
[17:47:46.571908] Epoch: [82]  [6000/6672]  eta: 0:08:17  lr: 0.000885  loss: 2.6108 (2.6692)  time: 0.7352  data: 0.0003  max mem: 30335
[17:56:05.064607] Epoch: [82]  [6671/6672]  eta: 0:00:00  lr: 0.000884  loss: 2.8000 (2.6726)  time: 0.7233  data: 0.0006  max mem: 30335
[17:56:05.886820] Epoch: [82] Total time: 1:22:19 (0.7404 s / it)
[17:56:05.961173] Averaged stats: lr: 0.000884  loss: 2.8000 (2.6811)
[17:56:10.759443] Test:  [   0/2084]  eta: 2:46:30  loss: 0.5517 (0.5517)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.7940  data: 4.2902  max mem: 30335
[17:58:19.293386] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.7190 (0.7486)  acc1: 75.0000 (80.8466)  acc5: 95.8333 (96.1660)  time: 0.2694  data: 0.0002  max mem: 30335
[18:00:27.697039] Test:  [1000/2084]  eta: 0:04:43  loss: 0.8574 (0.7937)  acc1: 83.3333 (79.5663)  acc5: 91.6667 (95.7085)  time: 0.2579  data: 0.0002  max mem: 30335
[18:02:36.094456] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6577 (0.8869)  acc1: 83.3333 (77.5233)  acc5: 95.8333 (94.5814)  time: 0.2561  data: 0.0002  max mem: 30335
[18:04:47.662831] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3537 (0.9321)  acc1: 91.6667 (76.4972)  acc5: 100.0000 (93.9926)  time: 0.2566  data: 0.0002  max mem: 30335
[18:05:09.042156] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6156 (0.9318)  acc1: 87.5000 (76.4980)  acc5: 100.0000 (94.0300)  time: 0.2599  data: 0.0001  max mem: 30335
[18:05:09.152005] Test: Total time: 0:09:03 (0.2606 s / it)
[18:05:24.437092] * Acc@1 76.505 Acc@5 94.041 loss 0.932
[18:05:24.437370] Accuracy of the network on the 50000 test images: 76.5%
[18:05:24.437404] Max accuracy: 76.69%
[18:05:24.576522] log_dir: ./output_dir_qkformer
[18:05:32.859191] Epoch: [83]  [   0/6672]  eta: 15:13:55  lr: 0.000884  loss: 2.6137 (2.6137)  time: 8.2188  data: 2.5436  max mem: 30335
[18:30:16.694418] Epoch: [83]  [2000/6672]  eta: 0:58:03  lr: 0.000881  loss: 2.7649 (2.6581)  time: 0.7308  data: 0.0002  max mem: 30335
[18:55:22.684153] Epoch: [83]  [4000/6672]  eta: 0:33:21  lr: 0.000878  loss: 2.6675 (2.6667)  time: 0.7298  data: 0.0003  max mem: 30335
[19:20:19.617901] Epoch: [83]  [6000/6672]  eta: 0:08:23  lr: 0.000875  loss: 2.6433 (2.6743)  time: 0.7315  data: 0.0003  max mem: 30335
[19:28:44.936141] Epoch: [83]  [6671/6672]  eta: 0:00:00  lr: 0.000874  loss: 2.5792 (2.6741)  time: 0.7237  data: 0.0011  max mem: 30335
[19:28:45.751085] Epoch: [83] Total time: 1:23:21 (0.7496 s / it)
[19:28:45.807691] Averaged stats: lr: 0.000874  loss: 2.5792 (2.6709)
[19:28:50.466065] Test:  [   0/2084]  eta: 2:41:17  loss: 0.3812 (0.3812)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6437  data: 3.8867  max mem: 30335
[19:31:01.164993] Test:  [ 500/2084]  eta: 0:07:07  loss: 0.6961 (0.7524)  acc1: 83.3333 (81.2126)  acc5: 100.0000 (96.1494)  time: 0.2566  data: 0.0002  max mem: 30335
[19:33:09.450977] Test:  [1000/2084]  eta: 0:04:45  loss: 0.9774 (0.7843)  acc1: 75.0000 (79.9700)  acc5: 91.6667 (95.8666)  time: 0.2560  data: 0.0002  max mem: 30335
[19:35:18.989873] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7430 (0.8798)  acc1: 83.3333 (77.7898)  acc5: 95.8333 (94.7119)  time: 0.2568  data: 0.0003  max mem: 30335
[19:37:27.796954] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3417 (0.9286)  acc1: 91.6667 (76.6533)  acc5: 100.0000 (93.9509)  time: 0.2565  data: 0.0002  max mem: 30335
[19:37:49.317984] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4122 (0.9305)  acc1: 87.5000 (76.6020)  acc5: 100.0000 (93.9620)  time: 0.2497  data: 0.0001  max mem: 30335
[19:37:49.430516] Test: Total time: 0:09:03 (0.2609 s / it)
[19:38:05.307640] * Acc@1 76.591 Acc@5 93.963 loss 0.931
[19:38:05.308123] Accuracy of the network on the 50000 test images: 76.6%
[19:38:05.308173] Max accuracy: 76.69%
[19:38:05.603423] log_dir: ./output_dir_qkformer
[19:38:10.467423] Epoch: [84]  [   0/6672]  eta: 8:59:09  lr: 0.000874  loss: 3.6027 (3.6027)  time: 4.8485  data: 2.9554  max mem: 30335
[20:03:08.345271] Epoch: [84]  [2000/6672]  eta: 0:58:27  lr: 0.000870  loss: 2.6314 (2.6566)  time: 0.7350  data: 0.0004  max mem: 30335
[20:27:55.878120] Epoch: [84]  [4000/6672]  eta: 0:33:16  lr: 0.000867  loss: 2.6778 (2.6613)  time: 0.7264  data: 0.0003  max mem: 30335
[20:52:38.226402] Epoch: [84]  [6000/6672]  eta: 0:08:20  lr: 0.000864  loss: 2.7824 (2.6648)  time: 0.7288  data: 0.0002  max mem: 30335
[21:01:04.420247] Epoch: [84]  [6671/6672]  eta: 0:00:00  lr: 0.000863  loss: 2.5440 (2.6643)  time: 0.7267  data: 0.0010  max mem: 30335
[21:01:05.123060] Epoch: [84] Total time: 1:22:59 (0.7463 s / it)
[21:01:05.155971] Averaged stats: lr: 0.000863  loss: 2.5440 (2.6673)
[21:01:10.534596] Test:  [   0/2084]  eta: 3:06:37  loss: 0.4837 (0.4837)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.3730  data: 4.7765  max mem: 30335
[21:03:19.501567] Test:  [ 500/2084]  eta: 0:07:04  loss: 0.9524 (0.7244)  acc1: 75.0000 (81.7199)  acc5: 100.0000 (96.4737)  time: 0.2567  data: 0.0002  max mem: 30335
[21:05:28.994765] Test:  [1000/2084]  eta: 0:04:45  loss: 0.8750 (0.7745)  acc1: 79.1667 (80.0824)  acc5: 91.6667 (95.8500)  time: 0.2568  data: 0.0002  max mem: 30335
[21:07:41.634197] Test:  [1500/2084]  eta: 0:02:34  loss: 0.6134 (0.8728)  acc1: 83.3333 (77.6649)  acc5: 95.8333 (94.6064)  time: 0.2560  data: 0.0002  max mem: 30335
[21:09:50.192158] Test:  [2000/2084]  eta: 0:00:22  loss: 0.3418 (0.9185)  acc1: 91.6667 (76.5909)  acc5: 95.8333 (93.9926)  time: 0.2575  data: 0.0002  max mem: 30335
[21:10:11.340725] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6132 (0.9204)  acc1: 87.5000 (76.5560)  acc5: 95.8333 (93.9780)  time: 0.2481  data: 0.0001  max mem: 30335
[21:10:11.476687] Test: Total time: 0:09:06 (0.2621 s / it)
[21:10:26.929235] * Acc@1 76.560 Acc@5 93.964 loss 0.920
[21:10:26.929568] Accuracy of the network on the 50000 test images: 76.6%
[21:10:26.929600] Max accuracy: 76.69%
[21:10:27.203961] log_dir: ./output_dir_qkformer
[21:10:32.779287] Epoch: [85]  [   0/6672]  eta: 9:56:38  lr: 0.000863  loss: 2.1937 (2.1937)  time: 5.3655  data: 2.9681  max mem: 30335
[21:35:43.145654] Epoch: [85]  [2000/6672]  eta: 0:58:58  lr: 0.000860  loss: 2.5248 (2.6466)  time: 0.7307  data: 0.0003  max mem: 30335
[22:00:55.229708] Epoch: [85]  [4000/6672]  eta: 0:33:41  lr: 0.000857  loss: 2.5406 (2.6515)  time: 0.7282  data: 0.0003  max mem: 30335
[22:26:06.355184] Epoch: [85]  [6000/6672]  eta: 0:08:28  lr: 0.000854  loss: 2.6270 (2.6604)  time: 0.7351  data: 0.0002  max mem: 30335
[22:34:24.648891] Epoch: [85]  [6671/6672]  eta: 0:00:00  lr: 0.000853  loss: 2.5395 (2.6608)  time: 0.7289  data: 0.0006  max mem: 30335
[22:34:25.436162] Epoch: [85] Total time: 1:23:58 (0.7551 s / it)
[22:34:25.496986] Averaged stats: lr: 0.000853  loss: 2.5395 (2.6636)
[22:34:30.096926] Test:  [   0/2084]  eta: 2:39:31  loss: 0.2243 (0.2243)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.5929  data: 3.8570  max mem: 30335
[22:36:38.506818] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.9713 (0.7873)  acc1: 70.8333 (80.4142)  acc5: 95.8333 (96.0496)  time: 0.2564  data: 0.0002  max mem: 30335
[22:38:47.034201] Test:  [1000/2084]  eta: 0:04:43  loss: 0.7881 (0.8068)  acc1: 87.5000 (80.0033)  acc5: 95.8333 (95.6752)  time: 0.2563  data: 0.0002  max mem: 30335
[22:40:55.952080] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6321 (0.8955)  acc1: 83.3333 (77.7676)  acc5: 95.8333 (94.5453)  time: 0.2575  data: 0.0002  max mem: 30335
[22:43:04.378667] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4843 (0.9362)  acc1: 87.5000 (76.7304)  acc5: 100.0000 (94.0009)  time: 0.2571  data: 0.0002  max mem: 30335
[22:43:25.746721] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4848 (0.9388)  acc1: 87.5000 (76.6600)  acc5: 95.8333 (93.9740)  time: 0.2582  data: 0.0002  max mem: 30335
[22:43:25.867915] Test: Total time: 0:09:00 (0.2593 s / it)
[22:43:41.240497] * Acc@1 76.677 Acc@5 93.977 loss 0.939
[22:43:41.240995] Accuracy of the network on the 50000 test images: 76.7%
[22:43:41.241039] Max accuracy: 76.69%
[22:43:41.427481] log_dir: ./output_dir_qkformer
[22:43:48.949541] Epoch: [86]  [   0/6672]  eta: 13:56:09  lr: 0.000853  loss: 2.1875 (2.1875)  time: 7.5194  data: 3.2041  max mem: 30335
[23:09:00.732756] Epoch: [86]  [2000/6672]  eta: 0:59:06  lr: 0.000850  loss: 2.7452 (2.6493)  time: 0.7295  data: 0.0002  max mem: 30335
[23:34:13.912546] Epoch: [86]  [4000/6672]  eta: 0:33:44  lr: 0.000846  loss: 2.4948 (2.6569)  time: 0.7266  data: 0.0002  max mem: 30335
[23:59:42.207423] Epoch: [86]  [6000/6672]  eta: 0:08:30  lr: 0.000843  loss: 2.6424 (2.6642)  time: 0.7258  data: 0.0002  max mem: 30335
[00:08:19.828277] Epoch: [86]  [6671/6672]  eta: 0:00:00  lr: 0.000842  loss: 2.6261 (2.6649)  time: 0.7234  data: 0.0006  max mem: 30335
[00:08:20.790620] Epoch: [86] Total time: 1:24:39 (0.7613 s / it)
[00:08:20.810994] Averaged stats: lr: 0.000842  loss: 2.6261 (2.6564)
[00:08:25.546468] Test:  [   0/2084]  eta: 2:44:17  loss: 0.3394 (0.3394)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7301  data: 4.0314  max mem: 30335
[00:10:34.148327] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.7042 (0.7510)  acc1: 79.1667 (80.5057)  acc5: 100.0000 (96.3240)  time: 0.2578  data: 0.0002  max mem: 30335
[00:12:42.953210] Test:  [1000/2084]  eta: 0:04:43  loss: 0.9704 (0.7939)  acc1: 79.1667 (79.6620)  acc5: 91.6667 (95.7542)  time: 0.2571  data: 0.0003  max mem: 30335
[00:14:51.413546] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6165 (0.8844)  acc1: 83.3333 (77.5899)  acc5: 95.8333 (94.6730)  time: 0.2566  data: 0.0002  max mem: 30335
[00:17:00.049541] Test:  [2000/2084]  eta: 0:00:21  loss: 0.5056 (0.9409)  acc1: 91.6667 (76.1932)  acc5: 95.8333 (93.9676)  time: 0.2568  data: 0.0002  max mem: 30335
[00:17:21.221247] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4266 (0.9430)  acc1: 87.5000 (76.1560)  acc5: 95.8333 (93.9660)  time: 0.2497  data: 0.0001  max mem: 30335
[00:17:21.349557] Test: Total time: 0:09:00 (0.2594 s / it)
[00:17:36.772896] * Acc@1 76.170 Acc@5 93.965 loss 0.943
[00:17:36.773205] Accuracy of the network on the 50000 test images: 76.2%
[00:17:36.773254] Max accuracy: 76.69%
[00:17:36.879341] log_dir: ./output_dir_qkformer
[00:17:40.902802] Epoch: [87]  [   0/6672]  eta: 7:22:48  lr: 0.000842  loss: 1.9576 (1.9576)  time: 3.9821  data: 3.0328  max mem: 30335
[00:42:56.483424] Epoch: [87]  [2000/6672]  eta: 0:59:07  lr: 0.000839  loss: 2.7114 (2.6158)  time: 0.7291  data: 0.0002  max mem: 30335
[01:08:31.666465] Epoch: [87]  [4000/6672]  eta: 0:33:59  lr: 0.000836  loss: 2.6235 (2.6333)  time: 0.8693  data: 0.0020  max mem: 30335
[01:33:58.524348] Epoch: [87]  [6000/6672]  eta: 0:08:32  lr: 0.000833  loss: 2.5843 (2.6349)  time: 1.2425  data: 0.0002  max mem: 30335
[01:42:26.842125] Epoch: [87]  [6671/6672]  eta: 0:00:00  lr: 0.000832  loss: 2.6384 (2.6374)  time: 0.7235  data: 0.0011  max mem: 30335
[01:42:27.645329] Epoch: [87] Total time: 1:24:50 (0.7630 s / it)
[01:42:27.707770] Averaged stats: lr: 0.000832  loss: 2.6384 (2.6492)
[01:42:33.640544] Test:  [   0/2084]  eta: 3:25:48  loss: 0.2327 (0.2327)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 5.9254  data: 5.0968  max mem: 30335
[01:44:42.059464] Test:  [ 500/2084]  eta: 0:07:04  loss: 0.7443 (0.7692)  acc1: 79.1667 (80.1314)  acc5: 100.0000 (96.2076)  time: 0.2573  data: 0.0002  max mem: 30335
[01:46:51.086617] Test:  [1000/2084]  eta: 0:04:45  loss: 0.8348 (0.8061)  acc1: 83.3333 (79.2458)  acc5: 91.6667 (95.6294)  time: 0.2625  data: 0.0002  max mem: 30335
[01:49:00.234618] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7542 (0.8900)  acc1: 83.3333 (77.4345)  acc5: 95.8333 (94.6897)  time: 0.2573  data: 0.0002  max mem: 30335
[01:51:08.887416] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2931 (0.9284)  acc1: 95.8333 (76.5638)  acc5: 100.0000 (94.1259)  time: 0.2565  data: 0.0002  max mem: 30335
[01:51:30.088790] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6707 (0.9292)  acc1: 87.5000 (76.5540)  acc5: 95.8333 (94.1500)  time: 0.2502  data: 0.0001  max mem: 30335
[01:51:30.205227] Test: Total time: 0:09:02 (0.2603 s / it)
[01:51:46.188332] * Acc@1 76.555 Acc@5 94.168 loss 0.929
[01:51:46.188617] Accuracy of the network on the 50000 test images: 76.6%
[01:51:46.188655] Max accuracy: 76.69%
[01:51:46.670538] log_dir: ./output_dir_qkformer
[01:51:59.703616] Epoch: [88]  [   0/6672]  eta: 23:19:16  lr: 0.000832  loss: 2.2733 (2.2733)  time: 12.5835  data: 3.1700  max mem: 30335
[02:17:04.292046] Epoch: [88]  [2000/6672]  eta: 0:59:01  lr: 0.000828  loss: 2.4772 (2.6436)  time: 0.7339  data: 0.0003  max mem: 30335
[02:42:32.560646] Epoch: [88]  [4000/6672]  eta: 0:33:53  lr: 0.000825  loss: 2.6236 (2.6481)  time: 0.7320  data: 0.0003  max mem: 30335
[03:07:45.899346] Epoch: [88]  [6000/6672]  eta: 0:08:30  lr: 0.000822  loss: 2.6487 (2.6509)  time: 0.7347  data: 0.0004  max mem: 30335
[03:16:03.288709] Epoch: [88]  [6671/6672]  eta: 0:00:00  lr: 0.000821  loss: 2.5175 (2.6489)  time: 0.7225  data: 0.0010  max mem: 30335
[03:16:04.143024] Epoch: [88] Total time: 1:24:17 (0.7580 s / it)
[03:16:04.239467] Averaged stats: lr: 0.000821  loss: 2.5175 (2.6440)
[03:16:08.532220] Test:  [   0/2084]  eta: 2:28:53  loss: 0.3406 (0.3406)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.2869  data: 3.6566  max mem: 30335
[03:18:17.522933] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.6624 (0.7358)  acc1: 75.0000 (81.5369)  acc5: 100.0000 (96.2991)  time: 0.2563  data: 0.0002  max mem: 30335
[03:20:25.574768] Test:  [1000/2084]  eta: 0:04:42  loss: 0.7822 (0.7711)  acc1: 83.3333 (80.4071)  acc5: 91.6667 (95.9207)  time: 0.2557  data: 0.0002  max mem: 30335
[03:22:35.194651] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7721 (0.8577)  acc1: 79.1667 (78.2395)  acc5: 95.8333 (94.9201)  time: 0.2562  data: 0.0002  max mem: 30335
[03:24:43.643339] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3471 (0.9001)  acc1: 91.6667 (77.2385)  acc5: 100.0000 (94.3258)  time: 0.2606  data: 0.0002  max mem: 30335
[03:25:04.805333] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6148 (0.9052)  acc1: 83.3333 (77.1240)  acc5: 95.8333 (94.2900)  time: 0.2491  data: 0.0001  max mem: 30335
[03:25:04.921980] Test: Total time: 0:09:00 (0.2594 s / it)
[03:25:20.446900] * Acc@1 77.105 Acc@5 94.286 loss 0.906
[03:25:20.447347] Accuracy of the network on the 50000 test images: 77.1%
[03:25:20.447397] Max accuracy: 77.10%
[03:25:20.576150] log_dir: ./output_dir_qkformer
[03:25:28.597926] Epoch: [89]  [   0/6672]  eta: 14:36:45  lr: 0.000821  loss: 2.1696 (2.1696)  time: 7.8845  data: 1.9748  max mem: 30335
[03:50:42.885004] Epoch: [89]  [2000/6672]  eta: 0:59:13  lr: 0.000818  loss: 2.5035 (2.6235)  time: 0.7327  data: 0.0002  max mem: 30335
[04:15:39.043430] Epoch: [89]  [4000/6672]  eta: 0:33:35  lr: 0.000815  loss: 2.4492 (2.6288)  time: 0.7302  data: 0.0003  max mem: 30335
[04:40:52.860511] Epoch: [89]  [6000/6672]  eta: 0:08:27  lr: 0.000811  loss: 2.5662 (2.6390)  time: 0.9939  data: 0.0003  max mem: 30335
[04:49:21.569524] Epoch: [89]  [6671/6672]  eta: 0:00:00  lr: 0.000810  loss: 2.3362 (2.6388)  time: 0.7240  data: 0.0006  max mem: 30335
[04:49:22.557063] Epoch: [89] Total time: 1:24:01 (0.7557 s / it)
[04:49:22.607154] Averaged stats: lr: 0.000810  loss: 2.3362 (2.6368)
[04:49:27.073301] Test:  [   0/2084]  eta: 2:34:56  loss: 0.2811 (0.2811)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.4607  data: 3.8230  max mem: 30335
[04:51:35.412272] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.8999 (0.7200)  acc1: 70.8333 (81.6617)  acc5: 95.8333 (96.5070)  time: 0.2564  data: 0.0002  max mem: 30335
[04:53:44.175203] Test:  [1000/2084]  eta: 0:04:43  loss: 0.6897 (0.7773)  acc1: 83.3333 (80.4487)  acc5: 91.6667 (95.8583)  time: 0.2563  data: 0.0002  max mem: 30335
[04:55:52.603248] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6818 (0.8666)  acc1: 87.5000 (78.3283)  acc5: 95.8333 (94.8284)  time: 0.2568  data: 0.0002  max mem: 30335
[04:58:01.953929] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3838 (0.9107)  acc1: 91.6667 (77.2405)  acc5: 95.8333 (94.1613)  time: 0.2565  data: 0.0002  max mem: 30335
[04:58:23.430470] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6387 (0.9115)  acc1: 83.3333 (77.2060)  acc5: 100.0000 (94.1880)  time: 0.2495  data: 0.0002  max mem: 30335
[04:58:23.547634] Test: Total time: 0:09:00 (0.2596 s / it)
[04:58:39.514064] * Acc@1 77.213 Acc@5 94.206 loss 0.911
[04:58:39.514413] Accuracy of the network on the 50000 test images: 77.2%
[04:58:39.514461] Max accuracy: 77.21%
[04:58:39.705909] log_dir: ./output_dir_qkformer
[04:58:46.487549] Epoch: [90]  [   0/6672]  eta: 12:34:00  lr: 0.000810  loss: 2.3426 (2.3426)  time: 6.7806  data: 2.8295  max mem: 30335
[05:23:54.153733] Epoch: [90]  [2000/6672]  eta: 0:58:55  lr: 0.000807  loss: 2.6393 (2.6245)  time: 0.7428  data: 0.0003  max mem: 30335
[05:49:00.621670] Epoch: [90]  [4000/6672]  eta: 0:33:36  lr: 0.000804  loss: 2.5973 (2.6258)  time: 0.7271  data: 0.0002  max mem: 30335
[06:14:37.246175] Epoch: [90]  [6000/6672]  eta: 0:08:30  lr: 0.000801  loss: 2.6435 (2.6315)  time: 0.7281  data: 0.0003  max mem: 30335
[06:23:19.952224] Epoch: [90]  [6671/6672]  eta: 0:00:00  lr: 0.000800  loss: 2.7227 (2.6345)  time: 0.7252  data: 0.0007  max mem: 30335
[06:23:20.841657] Epoch: [90] Total time: 1:24:41 (0.7616 s / it)
[06:23:20.901661] Averaged stats: lr: 0.000800  loss: 2.7227 (2.6322)
[06:23:27.485697] Test:  [   0/2084]  eta: 3:48:29  loss: 0.4703 (0.4703)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 6.5786  data: 5.5366  max mem: 30335
[06:25:39.275866] Test:  [ 500/2084]  eta: 0:07:17  loss: 0.7137 (0.6903)  acc1: 75.0000 (81.6949)  acc5: 95.8333 (96.6650)  time: 0.2564  data: 0.0002  max mem: 30335
[06:27:47.488749] Test:  [1000/2084]  eta: 0:04:48  loss: 0.5825 (0.7395)  acc1: 83.3333 (80.5361)  acc5: 95.8333 (96.1622)  time: 0.2568  data: 0.0002  max mem: 30335
[06:29:55.846166] Test:  [1500/2084]  eta: 0:02:33  loss: 0.8023 (0.8313)  acc1: 79.1667 (78.5004)  acc5: 95.8333 (94.9950)  time: 0.2560  data: 0.0002  max mem: 30335
[06:32:04.806448] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3207 (0.8861)  acc1: 91.6667 (77.2989)  acc5: 100.0000 (94.2612)  time: 0.2571  data: 0.0002  max mem: 30335
[06:32:25.892411] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5373 (0.8893)  acc1: 87.5000 (77.2040)  acc5: 100.0000 (94.2700)  time: 0.2478  data: 0.0001  max mem: 30335
[06:32:25.992537] Test: Total time: 0:09:05 (0.2616 s / it)
[06:32:41.584801] * Acc@1 77.201 Acc@5 94.280 loss 0.889
[06:32:41.585112] Accuracy of the network on the 50000 test images: 77.2%
[06:32:41.585159] Max accuracy: 77.21%
[06:32:41.807330] log_dir: ./output_dir_qkformer
[06:32:51.913298] Epoch: [91]  [   0/6672]  eta: 18:33:09  lr: 0.000800  loss: 2.6585 (2.6585)  time: 10.0104  data: 3.6169  max mem: 30335
[06:58:18.512516] Epoch: [91]  [2000/6672]  eta: 0:59:47  lr: 0.000797  loss: 2.7065 (2.6081)  time: 0.7312  data: 0.0003  max mem: 30335
[07:23:28.105934] Epoch: [91]  [4000/6672]  eta: 0:33:54  lr: 0.000793  loss: 2.5028 (2.6199)  time: 0.7301  data: 0.0002  max mem: 30335
[07:49:05.092401] Epoch: [91]  [6000/6672]  eta: 0:08:33  lr: 0.000790  loss: 2.6987 (2.6185)  time: 1.0634  data: 0.0003  max mem: 30335
[07:57:27.163357] Epoch: [91]  [6671/6672]  eta: 0:00:00  lr: 0.000789  loss: 2.5454 (2.6199)  time: 0.7459  data: 0.0006  max mem: 30335
[07:57:28.011438] Epoch: [91] Total time: 1:24:46 (0.7623 s / it)
[07:57:28.064587] Averaged stats: lr: 0.000789  loss: 2.5454 (2.6219)
[07:57:33.218724] Test:  [   0/2084]  eta: 2:58:49  loss: 0.3369 (0.3369)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.1484  data: 4.3447  max mem: 30335
[07:59:43.025937] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.8325 (0.7090)  acc1: 79.1667 (81.3706)  acc5: 95.8333 (96.4654)  time: 0.2566  data: 0.0002  max mem: 30335
[08:01:52.144067] Test:  [1000/2084]  eta: 0:04:45  loss: 0.9175 (0.7507)  acc1: 75.0000 (80.0907)  acc5: 91.6667 (96.0997)  time: 0.2564  data: 0.0002  max mem: 30335
[08:04:01.609946] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4721 (0.8352)  acc1: 87.5000 (78.4033)  acc5: 95.8333 (94.9756)  time: 0.2566  data: 0.0002  max mem: 30335
[08:06:10.668924] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4168 (0.8865)  acc1: 87.5000 (77.3759)  acc5: 100.0000 (94.2341)  time: 0.2583  data: 0.0002  max mem: 30335
[08:06:31.933292] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4716 (0.8877)  acc1: 87.5000 (77.3380)  acc5: 100.0000 (94.2500)  time: 0.2538  data: 0.0001  max mem: 30335
[08:06:32.060616] Test: Total time: 0:09:03 (0.2610 s / it)
[08:06:47.689460] * Acc@1 77.338 Acc@5 94.262 loss 0.888
[08:06:47.689766] Accuracy of the network on the 50000 test images: 77.3%
[08:06:47.689821] Max accuracy: 77.34%
[08:06:48.007174] log_dir: ./output_dir_qkformer
[08:06:55.807114] Epoch: [92]  [   0/6672]  eta: 14:26:56  lr: 0.000789  loss: 1.8987 (1.8987)  time: 7.7962  data: 2.5419  max mem: 30335
[08:32:24.456285] Epoch: [92]  [2000/6672]  eta: 0:59:46  lr: 0.000786  loss: 2.8045 (2.6040)  time: 0.9127  data: 0.0056  max mem: 30335
[08:57:45.469459] Epoch: [92]  [4000/6672]  eta: 0:34:01  lr: 0.000783  loss: 2.7531 (2.6180)  time: 0.7356  data: 0.0003  max mem: 30335
[09:23:10.256177] Epoch: [92]  [6000/6672]  eta: 0:08:33  lr: 0.000779  loss: 2.4301 (2.6204)  time: 0.7639  data: 0.0002  max mem: 30335
[09:31:41.640493] Epoch: [92]  [6671/6672]  eta: 0:00:00  lr: 0.000778  loss: 2.6274 (2.6236)  time: 0.7247  data: 0.0011  max mem: 30335
[09:31:42.317898] Epoch: [92] Total time: 1:24:54 (0.7635 s / it)
[09:31:42.351222] Averaged stats: lr: 0.000778  loss: 2.6274 (2.6209)
[09:31:47.750479] Test:  [   0/2084]  eta: 3:07:22  loss: 0.3080 (0.3080)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.3948  data: 4.6481  max mem: 30335
[09:33:56.375626] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.7761 (0.7050)  acc1: 75.0000 (81.6949)  acc5: 95.8333 (96.6484)  time: 0.2571  data: 0.0002  max mem: 30335
[09:36:05.630972] Test:  [1000/2084]  eta: 0:04:45  loss: 0.7854 (0.7482)  acc1: 79.1667 (80.7110)  acc5: 91.6667 (96.2038)  time: 0.2577  data: 0.0002  max mem: 30335
[09:38:14.261641] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6421 (0.8375)  acc1: 87.5000 (78.5865)  acc5: 95.8333 (95.1199)  time: 0.2564  data: 0.0002  max mem: 30335
[09:40:24.117622] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3843 (0.8801)  acc1: 91.6667 (77.5737)  acc5: 95.8333 (94.5361)  time: 0.2577  data: 0.0002  max mem: 30335
[09:40:45.316779] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5561 (0.8846)  acc1: 83.3333 (77.5200)  acc5: 100.0000 (94.5020)  time: 0.2504  data: 0.0001  max mem: 30335
[09:40:45.447711] Test: Total time: 0:09:03 (0.2606 s / it)
[09:41:01.159496] * Acc@1 77.510 Acc@5 94.492 loss 0.885
[09:41:01.159779] Accuracy of the network on the 50000 test images: 77.5%
[09:41:01.159811] Max accuracy: 77.51%
[09:41:01.491352] log_dir: ./output_dir_qkformer
[09:41:10.515119] Epoch: [93]  [   0/6672]  eta: 16:43:16  lr: 0.000778  loss: 2.4092 (2.4092)  time: 9.0223  data: 2.1446  max mem: 30335
[10:06:29.759259] Epoch: [93]  [2000/6672]  eta: 0:59:27  lr: 0.000775  loss: 2.7888 (2.6092)  time: 0.7339  data: 0.0003  max mem: 30335
[10:31:52.352747] Epoch: [93]  [4000/6672]  eta: 0:33:56  lr: 0.000772  loss: 2.7329 (2.6055)  time: 0.8336  data: 0.0003  max mem: 30335
[10:57:19.476417] Epoch: [93]  [6000/6672]  eta: 0:08:32  lr: 0.000769  loss: 2.6970 (2.6077)  time: 0.7556  data: 0.0003  max mem: 30335
[11:05:48.503018] Epoch: [93]  [6671/6672]  eta: 0:00:00  lr: 0.000768  loss: 2.5202 (2.6090)  time: 0.7244  data: 0.0011  max mem: 30335
[11:05:49.301605] Epoch: [93] Total time: 1:24:47 (0.7626 s / it)
[11:05:49.340219] Averaged stats: lr: 0.000768  loss: 2.5202 (2.6128)
[11:05:56.245761] Test:  [   0/2084]  eta: 3:59:41  loss: 0.3216 (0.3216)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 6.9008  data: 6.0841  max mem: 30335
[11:08:06.866696] Test:  [ 500/2084]  eta: 0:07:14  loss: 0.6793 (0.7047)  acc1: 79.1667 (82.3436)  acc5: 95.8333 (96.5818)  time: 0.2576  data: 0.0002  max mem: 30335
[11:10:16.301484] Test:  [1000/2084]  eta: 0:04:49  loss: 0.5748 (0.7599)  acc1: 83.3333 (80.5819)  acc5: 91.6667 (95.9665)  time: 0.3013  data: 0.0002  max mem: 30335
[11:12:26.873479] Test:  [1500/2084]  eta: 0:02:34  loss: 0.6750 (0.8494)  acc1: 83.3333 (78.4533)  acc5: 95.8333 (94.8146)  time: 0.2620  data: 0.0002  max mem: 30335
[11:14:37.452143] Test:  [2000/2084]  eta: 0:00:22  loss: 0.4082 (0.8940)  acc1: 83.3333 (77.4738)  acc5: 100.0000 (94.1488)  time: 0.2576  data: 0.0002  max mem: 30335
[11:14:58.613730] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5560 (0.8990)  acc1: 87.5000 (77.3320)  acc5: 100.0000 (94.1320)  time: 0.2488  data: 0.0001  max mem: 30335
[11:14:58.746966] Test: Total time: 0:09:09 (0.2636 s / it)
[11:15:13.876969] * Acc@1 77.322 Acc@5 94.142 loss 0.899
[11:15:13.877277] Accuracy of the network on the 50000 test images: 77.3%
[11:15:13.877313] Max accuracy: 77.51%
[11:15:14.726160] log_dir: ./output_dir_qkformer
[11:15:22.391365] Epoch: [94]  [   0/6672]  eta: 14:10:46  lr: 0.000768  loss: 2.7417 (2.7417)  time: 7.6509  data: 2.6060  max mem: 30335
[11:40:55.910855] Epoch: [94]  [2000/6672]  eta: 0:59:57  lr: 0.000764  loss: 2.5651 (2.5916)  time: 0.7564  data: 0.0003  max mem: 30335
[12:06:21.207466] Epoch: [94]  [4000/6672]  eta: 0:34:07  lr: 0.000761  loss: 2.6412 (2.5991)  time: 0.7327  data: 0.0003  max mem: 30335
[12:31:44.843352] Epoch: [94]  [6000/6672]  eta: 0:08:33  lr: 0.000758  loss: 2.6581 (2.6071)  time: 0.7289  data: 0.0003  max mem: 30335
[12:40:25.246356] Epoch: [94]  [6671/6672]  eta: 0:00:00  lr: 0.000757  loss: 2.7334 (2.6096)  time: 0.7265  data: 0.0011  max mem: 30335
[12:40:26.093104] Epoch: [94] Total time: 1:25:11 (0.7661 s / it)
[12:40:26.152150] Averaged stats: lr: 0.000757  loss: 2.7334 (2.6093)
[12:40:31.315678] Test:  [   0/2084]  eta: 2:59:04  loss: 0.3394 (0.3394)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.1557  data: 4.3802  max mem: 30335
[12:42:40.225621] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.9413 (0.7346)  acc1: 75.0000 (81.4371)  acc5: 95.8333 (96.2242)  time: 0.2574  data: 0.0002  max mem: 30335
[12:44:49.329754] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6991 (0.7673)  acc1: 83.3333 (80.3655)  acc5: 91.6667 (95.9374)  time: 0.2574  data: 0.0002  max mem: 30335
[12:46:57.720398] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6246 (0.8561)  acc1: 87.5000 (78.2256)  acc5: 95.8333 (94.8701)  time: 0.2572  data: 0.0002  max mem: 30335
[12:49:08.040914] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3643 (0.8987)  acc1: 91.6667 (77.3384)  acc5: 100.0000 (94.3403)  time: 0.2564  data: 0.0002  max mem: 30335
[12:49:29.248238] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4548 (0.9026)  acc1: 87.5000 (77.2820)  acc5: 100.0000 (94.3280)  time: 0.2501  data: 0.0001  max mem: 30335
[12:49:29.353231] Test: Total time: 0:09:03 (0.2606 s / it)
[12:49:44.633036] * Acc@1 77.297 Acc@5 94.320 loss 0.903
[12:49:44.633326] Accuracy of the network on the 50000 test images: 77.3%
[12:49:44.633388] Max accuracy: 77.51%
[12:49:45.149653] log_dir: ./output_dir_qkformer
[12:49:53.572087] Epoch: [95]  [   0/6672]  eta: 15:36:23  lr: 0.000757  loss: 2.2731 (2.2731)  time: 8.4208  data: 2.7927  max mem: 30335
[13:15:08.681942] Epoch: [95]  [2000/6672]  eta: 0:59:15  lr: 0.000754  loss: 2.6452 (2.5866)  time: 0.7318  data: 0.0003  max mem: 30335
[13:40:27.601478] Epoch: [95]  [4000/6672]  eta: 0:33:51  lr: 0.000750  loss: 2.6005 (2.5901)  time: 0.7260  data: 0.0003  max mem: 30335
[14:05:48.891919] Epoch: [95]  [6000/6672]  eta: 0:08:30  lr: 0.000747  loss: 2.6858 (2.6022)  time: 1.0015  data: 0.0004  max mem: 30335
[14:14:15.169505] Epoch: [95]  [6671/6672]  eta: 0:00:00  lr: 0.000746  loss: 2.5274 (2.6028)  time: 0.7326  data: 0.0012  max mem: 30335
[14:14:15.980514] Epoch: [95] Total time: 1:24:30 (0.7600 s / it)
[14:14:16.015913] Averaged stats: lr: 0.000746  loss: 2.5274 (2.6025)
[14:14:21.480880] Test:  [   0/2084]  eta: 3:09:39  loss: 0.3320 (0.3320)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.4606  data: 4.6921  max mem: 30335
[14:16:30.925824] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.8097 (0.6914)  acc1: 75.0000 (82.2355)  acc5: 95.8333 (96.5070)  time: 0.2779  data: 0.0002  max mem: 30335
[14:18:40.063258] Test:  [1000/2084]  eta: 0:04:45  loss: 0.7663 (0.7416)  acc1: 87.5000 (80.9232)  acc5: 95.8333 (96.2329)  time: 0.2571  data: 0.0002  max mem: 30335
[14:20:48.553510] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7746 (0.8401)  acc1: 79.1667 (78.5532)  acc5: 95.8333 (95.0505)  time: 0.2564  data: 0.0002  max mem: 30335
[14:22:58.292835] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2817 (0.8769)  acc1: 91.6667 (77.6633)  acc5: 100.0000 (94.5298)  time: 0.2572  data: 0.0002  max mem: 30335
[14:23:19.477615] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5057 (0.8825)  acc1: 87.5000 (77.5200)  acc5: 100.0000 (94.4940)  time: 0.2500  data: 0.0002  max mem: 30335
[14:23:19.646954] Test: Total time: 0:09:03 (0.2609 s / it)
[14:23:35.067466] * Acc@1 77.514 Acc@5 94.500 loss 0.883
[14:23:35.067809] Accuracy of the network on the 50000 test images: 77.5%
[14:23:35.067870] Max accuracy: 77.51%
[14:23:35.206209] log_dir: ./output_dir_qkformer
[14:23:40.544411] Epoch: [96]  [   0/6672]  eta: 9:43:08  lr: 0.000746  loss: 2.7349 (2.7349)  time: 5.2441  data: 2.8863  max mem: 30335
[14:48:47.711219] Epoch: [96]  [2000/6672]  eta: 0:58:50  lr: 0.000743  loss: 2.4650 (2.5742)  time: 0.7323  data: 0.0003  max mem: 30335
[15:14:01.025254] Epoch: [96]  [4000/6672]  eta: 0:33:40  lr: 0.000740  loss: 2.4717 (2.5895)  time: 0.7636  data: 0.0003  max mem: 30335
[15:39:17.563748] Epoch: [96]  [6000/6672]  eta: 0:08:28  lr: 0.000736  loss: 2.6762 (2.5936)  time: 0.8211  data: 0.0004  max mem: 30335
[15:47:41.657283] Epoch: [96]  [6671/6672]  eta: 0:00:00  lr: 0.000735  loss: 2.6647 (2.5942)  time: 0.7249  data: 0.0011  max mem: 30335
[15:47:42.429438] Epoch: [96] Total time: 1:24:07 (0.7565 s / it)
[15:47:42.473535] Averaged stats: lr: 0.000735  loss: 2.6647 (2.5928)
[15:47:46.617207] Test:  [   0/2084]  eta: 2:23:40  loss: 0.6426 (0.6426)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.1366  data: 3.5368  max mem: 30335
[15:49:55.739961] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.9078 (0.7461)  acc1: 70.8333 (81.3373)  acc5: 95.8333 (96.2325)  time: 0.2578  data: 0.0002  max mem: 30335
[15:52:04.837446] Test:  [1000/2084]  eta: 0:04:44  loss: 1.0188 (0.7758)  acc1: 75.0000 (80.3072)  acc5: 95.8333 (95.9998)  time: 0.2563  data: 0.0002  max mem: 30335
[15:54:14.852482] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7427 (0.8570)  acc1: 83.3333 (78.3394)  acc5: 95.8333 (95.0450)  time: 0.2563  data: 0.0002  max mem: 30335
[15:56:24.232654] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4324 (0.8880)  acc1: 91.6667 (77.6216)  acc5: 100.0000 (94.5735)  time: 0.2573  data: 0.0002  max mem: 30335
[15:56:45.448294] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4853 (0.8913)  acc1: 87.5000 (77.5460)  acc5: 100.0000 (94.5740)  time: 0.2498  data: 0.0001  max mem: 30335
[15:56:45.576017] Test: Total time: 0:09:03 (0.2606 s / it)
[15:57:00.396342] * Acc@1 77.531 Acc@5 94.591 loss 0.891
[15:57:00.396917] Accuracy of the network on the 50000 test images: 77.5%
[15:57:00.396972] Max accuracy: 77.53%
[15:57:00.546103] log_dir: ./output_dir_qkformer
[15:57:07.792063] Epoch: [97]  [   0/6672]  eta: 13:25:38  lr: 0.000735  loss: 2.7054 (2.7054)  time: 7.2450  data: 3.3841  max mem: 30335
[16:22:09.125842] Epoch: [97]  [2000/6672]  eta: 0:58:41  lr: 0.000732  loss: 2.4984 (2.5729)  time: 0.7307  data: 0.0002  max mem: 30335
[16:47:25.465922] Epoch: [97]  [4000/6672]  eta: 0:33:39  lr: 0.000729  loss: 2.4218 (2.5814)  time: 0.7604  data: 0.0004  max mem: 30335
[17:12:50.629436] Epoch: [97]  [6000/6672]  eta: 0:08:29  lr: 0.000725  loss: 2.4774 (2.5864)  time: 0.7811  data: 0.0003  max mem: 30335
[17:21:09.482638] Epoch: [97]  [6671/6672]  eta: 0:00:00  lr: 0.000724  loss: 2.4649 (2.5908)  time: 0.7281  data: 0.0010  max mem: 30335
[17:21:10.260740] Epoch: [97] Total time: 1:24:09 (0.7569 s / it)
[17:21:10.291136] Averaged stats: lr: 0.000724  loss: 2.4649 (2.5899)
[17:21:14.781279] Test:  [   0/2084]  eta: 2:35:45  loss: 0.3864 (0.3864)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4842  data: 3.7094  max mem: 30335
[17:23:23.788807] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.5334 (0.6936)  acc1: 87.5000 (82.4767)  acc5: 100.0000 (96.5486)  time: 0.2825  data: 0.0002  max mem: 30335
[17:25:32.578821] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7396 (0.7380)  acc1: 83.3333 (81.2604)  acc5: 91.6667 (96.1830)  time: 0.2574  data: 0.0002  max mem: 30335
[17:27:40.971439] Test:  [1500/2084]  eta: 0:02:31  loss: 0.7081 (0.8314)  acc1: 83.3333 (79.0473)  acc5: 95.8333 (95.0533)  time: 0.2561  data: 0.0002  max mem: 30335
[17:29:50.914954] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2747 (0.8702)  acc1: 91.6667 (78.2400)  acc5: 100.0000 (94.5652)  time: 0.2903  data: 0.0002  max mem: 30335
[17:30:12.083976] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4621 (0.8727)  acc1: 87.5000 (78.1600)  acc5: 100.0000 (94.5800)  time: 0.2483  data: 0.0001  max mem: 30335
[17:30:12.200354] Test: Total time: 0:09:01 (0.2600 s / it)
[17:30:27.584334] * Acc@1 78.129 Acc@5 94.582 loss 0.873
[17:30:27.584637] Accuracy of the network on the 50000 test images: 78.1%
[17:30:27.584670] Max accuracy: 78.13%
[17:30:27.941959] log_dir: ./output_dir_qkformer
[17:30:36.352209] Epoch: [98]  [   0/6672]  eta: 15:34:00  lr: 0.000724  loss: 1.8967 (1.8967)  time: 8.3993  data: 2.0035  max mem: 30335
[17:55:47.050584] Epoch: [98]  [2000/6672]  eta: 0:59:05  lr: 0.000721  loss: 2.5118 (2.5653)  time: 0.7288  data: 0.0002  max mem: 30335
[18:21:10.163996] Epoch: [98]  [4000/6672]  eta: 0:33:51  lr: 0.000718  loss: 2.7309 (2.5758)  time: 0.8230  data: 0.0003  max mem: 30335
[18:46:17.352127] Epoch: [98]  [6000/6672]  eta: 0:08:29  lr: 0.000715  loss: 2.5173 (2.5784)  time: 0.7301  data: 0.0002  max mem: 30335
[18:54:42.622684] Epoch: [98]  [6671/6672]  eta: 0:00:00  lr: 0.000714  loss: 2.4769 (2.5821)  time: 0.7251  data: 0.0006  max mem: 30335
[18:54:43.226344] Epoch: [98] Total time: 1:24:15 (0.7577 s / it)
[18:54:43.319737] Averaged stats: lr: 0.000714  loss: 2.4769 (2.5856)
[18:54:47.398038] Test:  [   0/2084]  eta: 2:21:27  loss: 0.3196 (0.3196)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.0728  data: 3.4844  max mem: 30335
[18:56:55.679774] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.6517 (0.7185)  acc1: 79.1667 (81.5868)  acc5: 95.8333 (96.2824)  time: 0.2573  data: 0.0002  max mem: 30335
[18:59:04.243894] Test:  [1000/2084]  eta: 0:04:42  loss: 0.6917 (0.7481)  acc1: 83.3333 (80.9274)  acc5: 91.6667 (96.1122)  time: 0.2571  data: 0.0002  max mem: 30335
[19:01:12.769175] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6532 (0.8272)  acc1: 83.3333 (78.9557)  acc5: 95.8333 (95.1033)  time: 0.2567  data: 0.0002  max mem: 30335
[19:03:22.080778] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3404 (0.8730)  acc1: 91.6667 (77.6966)  acc5: 100.0000 (94.5215)  time: 0.2574  data: 0.0002  max mem: 30335
[19:03:43.231538] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4934 (0.8747)  acc1: 91.6667 (77.6580)  acc5: 100.0000 (94.5440)  time: 0.2492  data: 0.0001  max mem: 30335
[19:03:43.351426] Test: Total time: 0:09:00 (0.2591 s / it)
[19:03:58.988639] * Acc@1 77.681 Acc@5 94.544 loss 0.874
[19:03:58.988991] Accuracy of the network on the 50000 test images: 77.7%
[19:03:58.989022] Max accuracy: 78.13%
[19:03:59.123889] log_dir: ./output_dir_qkformer
[19:04:08.074676] Epoch: [99]  [   0/6672]  eta: 16:34:29  lr: 0.000714  loss: 1.8221 (1.8221)  time: 8.9432  data: 2.4856  max mem: 30335
[19:29:14.409066] Epoch: [99]  [2000/6672]  eta: 0:58:57  lr: 0.000710  loss: 2.4594 (2.5571)  time: 0.7301  data: 0.0003  max mem: 30335
[19:54:37.297345] Epoch: [99]  [4000/6672]  eta: 0:33:48  lr: 0.000707  loss: 2.4732 (2.5687)  time: 1.0540  data: 0.0004  max mem: 30335
[20:19:57.811168] Epoch: [99]  [6000/6672]  eta: 0:08:30  lr: 0.000704  loss: 2.6807 (2.5743)  time: 0.7280  data: 0.0003  max mem: 30335
[20:28:30.304056] Epoch: [99]  [6671/6672]  eta: 0:00:00  lr: 0.000703  loss: 2.5914 (2.5751)  time: 0.7253  data: 0.0006  max mem: 30335
[20:28:30.961063] Epoch: [99] Total time: 1:24:31 (0.7602 s / it)
[20:28:31.036102] Averaged stats: lr: 0.000703  loss: 2.5914 (2.5778)
[20:28:35.392446] Test:  [   0/2084]  eta: 2:31:09  loss: 0.2568 (0.2568)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.3520  data: 3.7573  max mem: 30335
[20:30:43.970127] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.6456 (0.6958)  acc1: 79.1667 (82.2522)  acc5: 100.0000 (96.5902)  time: 0.2566  data: 0.0002  max mem: 30335
[20:32:52.379352] Test:  [1000/2084]  eta: 0:04:42  loss: 0.7641 (0.7193)  acc1: 83.3333 (81.6184)  acc5: 91.6667 (96.3370)  time: 0.2567  data: 0.0002  max mem: 30335
[20:35:02.027776] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6177 (0.7970)  acc1: 87.5000 (79.8412)  acc5: 95.8333 (95.3947)  time: 0.2569  data: 0.0002  max mem: 30335
[20:37:12.350242] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3315 (0.8501)  acc1: 91.6667 (78.6357)  acc5: 95.8333 (94.7547)  time: 0.2568  data: 0.0002  max mem: 30335
[20:37:33.494522] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3322 (0.8525)  acc1: 91.6667 (78.5440)  acc5: 95.8333 (94.7460)  time: 0.2484  data: 0.0001  max mem: 30335
[20:37:33.636076] Test: Total time: 0:09:02 (0.2604 s / it)
[20:37:48.906703] * Acc@1 78.551 Acc@5 94.752 loss 0.852
[20:37:48.906983] Accuracy of the network on the 50000 test images: 78.6%
[20:37:48.907033] Max accuracy: 78.55%
[20:37:49.053233] log_dir: ./output_dir_qkformer
[20:37:54.165830] Epoch: [100]  [   0/6672]  eta: 9:28:23  lr: 0.000703  loss: 2.3609 (2.3609)  time: 5.1114  data: 2.0274  max mem: 30335
[21:03:26.191226] Epoch: [100]  [2000/6672]  eta: 0:59:48  lr: 0.000699  loss: 2.4128 (2.5567)  time: 0.7325  data: 0.0002  max mem: 30335
[21:28:47.891584] Epoch: [100]  [4000/6672]  eta: 0:34:02  lr: 0.000696  loss: 2.6907 (2.5638)  time: 0.7265  data: 0.0002  max mem: 30335
[21:54:03.046454] Epoch: [100]  [6000/6672]  eta: 0:08:32  lr: 0.000693  loss: 2.6174 (2.5742)  time: 0.7279  data: 0.0003  max mem: 30335
[22:02:24.481628] Epoch: [100]  [6671/6672]  eta: 0:00:00  lr: 0.000692  loss: 2.6876 (2.5745)  time: 0.7242  data: 0.0011  max mem: 30335
[22:02:25.234632] Epoch: [100] Total time: 1:24:36 (0.7608 s / it)
[22:02:25.284243] Averaged stats: lr: 0.000692  loss: 2.6876 (2.5700)
[22:02:29.960900] Test:  [   0/2084]  eta: 2:42:13  loss: 0.2624 (0.2624)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6705  data: 4.0891  max mem: 30335
[22:04:38.969026] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.8324 (0.6625)  acc1: 79.1667 (82.8094)  acc5: 95.8333 (96.9561)  time: 0.2574  data: 0.0002  max mem: 30335
[22:06:47.875703] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7471 (0.7144)  acc1: 83.3333 (81.6392)  acc5: 91.6667 (96.3911)  time: 0.2571  data: 0.0002  max mem: 30335
[22:08:57.915134] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7675 (0.8204)  acc1: 79.1667 (79.1722)  acc5: 95.8333 (95.2060)  time: 0.2556  data: 0.0002  max mem: 30335
[22:11:06.325013] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4106 (0.8688)  acc1: 91.6667 (78.0776)  acc5: 100.0000 (94.6443)  time: 0.2563  data: 0.0002  max mem: 30335
[22:11:27.471590] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4726 (0.8693)  acc1: 87.5000 (78.0660)  acc5: 100.0000 (94.6780)  time: 0.2494  data: 0.0002  max mem: 30335
[22:11:27.586740] Test: Total time: 0:09:02 (0.2602 s / it)
[22:11:42.891615] * Acc@1 78.060 Acc@5 94.679 loss 0.869
[22:11:42.891965] Accuracy of the network on the 50000 test images: 78.1%
[22:11:42.892027] Max accuracy: 78.55%
[22:11:43.017649] log_dir: ./output_dir_qkformer
[22:11:51.088129] Epoch: [101]  [   0/6672]  eta: 14:44:48  lr: 0.000692  loss: 2.5651 (2.5651)  time: 7.9570  data: 2.1372  max mem: 30335
[22:37:09.589538] Epoch: [101]  [2000/6672]  eta: 0:59:23  lr: 0.000689  loss: 2.6589 (2.5390)  time: 0.7288  data: 0.0002  max mem: 30335
[23:02:23.318896] Epoch: [101]  [4000/6672]  eta: 0:33:49  lr: 0.000685  loss: 2.6716 (2.5493)  time: 0.7265  data: 0.0003  max mem: 30335
[23:28:03.555190] Epoch: [101]  [6000/6672]  eta: 0:08:32  lr: 0.000682  loss: 2.7509 (2.5555)  time: 1.0499  data: 0.0003  max mem: 30335
[23:36:24.782424] Epoch: [101]  [6671/6672]  eta: 0:00:00  lr: 0.000681  loss: 2.6698 (2.5579)  time: 0.7242  data: 0.0010  max mem: 30335
[23:36:25.589456] Epoch: [101] Total time: 1:24:42 (0.7618 s / it)
[23:36:25.648433] Averaged stats: lr: 0.000681  loss: 2.6698 (2.5656)
[23:36:31.868130] Test:  [   0/2084]  eta: 3:35:52  loss: 0.3748 (0.3748)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 6.2152  data: 4.8567  max mem: 30335
[23:38:40.384608] Test:  [ 500/2084]  eta: 0:07:05  loss: 0.6724 (0.6823)  acc1: 83.3333 (82.6264)  acc5: 100.0000 (96.7814)  time: 0.2576  data: 0.0002  max mem: 30335
[23:40:50.491203] Test:  [1000/2084]  eta: 0:04:46  loss: 0.8267 (0.7230)  acc1: 83.3333 (81.5102)  acc5: 95.8333 (96.4036)  time: 0.2569  data: 0.0002  max mem: 30335
[23:42:58.944896] Test:  [1500/2084]  eta: 0:02:33  loss: 0.5943 (0.8160)  acc1: 87.5000 (79.4221)  acc5: 95.8333 (95.3781)  time: 0.2584  data: 0.0002  max mem: 30335
[23:45:07.440234] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3464 (0.8502)  acc1: 91.6667 (78.5420)  acc5: 100.0000 (94.9296)  time: 0.2572  data: 0.0002  max mem: 30335
[23:45:28.634277] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3648 (0.8515)  acc1: 87.5000 (78.4980)  acc5: 100.0000 (94.9280)  time: 0.2498  data: 0.0001  max mem: 30335
[23:45:28.746177] Test: Total time: 0:09:03 (0.2606 s / it)
[23:45:44.156923] * Acc@1 78.503 Acc@5 94.917 loss 0.851
[23:45:44.157332] Accuracy of the network on the 50000 test images: 78.5%
[23:45:44.157374] Max accuracy: 78.55%
[23:45:44.496492] log_dir: ./output_dir_qkformer
[23:45:51.678737] Epoch: [102]  [   0/6672]  eta: 13:18:34  lr: 0.000681  loss: 2.0941 (2.0941)  time: 7.1814  data: 2.1078  max mem: 30335
[00:11:17.808345] Epoch: [102]  [2000/6672]  eta: 0:59:39  lr: 0.000678  loss: 2.5999 (2.5272)  time: 0.7288  data: 0.0003  max mem: 30335
[00:36:42.974527] Epoch: [102]  [4000/6672]  eta: 0:34:02  lr: 0.000674  loss: 2.6207 (2.5407)  time: 0.7258  data: 0.0005  max mem: 30335
[01:02:32.881903] Epoch: [102]  [6000/6672]  eta: 0:08:35  lr: 0.000671  loss: 2.4936 (2.5486)  time: 1.1599  data: 0.0003  max mem: 30335
[01:11:02.174713] Epoch: [102]  [6671/6672]  eta: 0:00:00  lr: 0.000670  loss: 2.4296 (2.5484)  time: 0.7247  data: 0.0011  max mem: 30335
[01:11:02.868554] Epoch: [102] Total time: 1:25:18 (0.7671 s / it)
[01:11:02.894565] Averaged stats: lr: 0.000670  loss: 2.4296 (2.5580)
[01:11:07.664905] Test:  [   0/2084]  eta: 2:45:23  loss: 0.4493 (0.4493)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7619  data: 4.0542  max mem: 30335
[01:13:17.940140] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.7528 (0.7113)  acc1: 75.0000 (81.8530)  acc5: 95.8333 (96.3906)  time: 0.3216  data: 0.0002  max mem: 30335
[01:15:26.397209] Test:  [1000/2084]  eta: 0:04:45  loss: 0.8206 (0.7416)  acc1: 83.3333 (81.0231)  acc5: 91.6667 (96.0331)  time: 0.2567  data: 0.0002  max mem: 30335
[01:17:35.883158] Test:  [1500/2084]  eta: 0:02:32  loss: 0.8159 (0.8152)  acc1: 83.3333 (79.3193)  acc5: 95.8333 (95.1477)  time: 0.2562  data: 0.0002  max mem: 30335
[01:19:45.153017] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3376 (0.8547)  acc1: 91.6667 (78.3004)  acc5: 100.0000 (94.6693)  time: 0.2560  data: 0.0002  max mem: 30335
[01:20:06.324036] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5789 (0.8570)  acc1: 87.5000 (78.2080)  acc5: 100.0000 (94.6960)  time: 0.2497  data: 0.0001  max mem: 30335
[01:20:06.440810] Test: Total time: 0:09:03 (0.2608 s / it)
[01:20:21.918716] * Acc@1 78.220 Acc@5 94.711 loss 0.857
[01:20:21.919001] Accuracy of the network on the 50000 test images: 78.2%
[01:20:21.919035] Max accuracy: 78.55%
[01:20:22.093078] log_dir: ./output_dir_qkformer
[01:20:26.673885] Epoch: [103]  [   0/6672]  eta: 8:27:48  lr: 0.000670  loss: 1.9449 (1.9449)  time: 4.5665  data: 2.8551  max mem: 30335
[01:45:47.110853] Epoch: [103]  [2000/6672]  eta: 0:59:19  lr: 0.000667  loss: 2.5293 (2.5484)  time: 0.7405  data: 0.0006  max mem: 30335
[02:11:18.318091] Epoch: [103]  [4000/6672]  eta: 0:34:00  lr: 0.000664  loss: 2.5015 (2.5506)  time: 0.7273  data: 0.0003  max mem: 30335
[02:36:20.445073] Epoch: [103]  [6000/6672]  eta: 0:08:30  lr: 0.000660  loss: 2.5173 (2.5538)  time: 0.7299  data: 0.0004  max mem: 30335
[02:44:58.152556] Epoch: [103]  [6671/6672]  eta: 0:00:00  lr: 0.000659  loss: 2.5386 (2.5533)  time: 0.7255  data: 0.0011  max mem: 30335
[02:44:58.897891] Epoch: [103] Total time: 1:24:36 (0.7609 s / it)
[02:44:59.086769] Averaged stats: lr: 0.000659  loss: 2.5386 (2.5498)
[02:45:03.627744] Test:  [   0/2084]  eta: 2:37:32  loss: 0.3005 (0.3005)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.5359  data: 3.8165  max mem: 30335
[02:47:12.849829] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.8270 (0.6871)  acc1: 75.0000 (82.7179)  acc5: 100.0000 (96.5985)  time: 0.2565  data: 0.0002  max mem: 30335
[02:49:21.520557] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6994 (0.7315)  acc1: 79.1667 (81.3187)  acc5: 95.8333 (96.2371)  time: 0.2573  data: 0.0002  max mem: 30335
[02:51:31.524060] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6581 (0.8138)  acc1: 83.3333 (79.3832)  acc5: 95.8333 (95.2448)  time: 0.2570  data: 0.0002  max mem: 30335
[02:53:40.292426] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4925 (0.8570)  acc1: 91.6667 (78.3941)  acc5: 100.0000 (94.7297)  time: 0.2569  data: 0.0002  max mem: 30335
[02:54:01.634116] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4171 (0.8602)  acc1: 87.5000 (78.3320)  acc5: 100.0000 (94.7160)  time: 0.2584  data: 0.0002  max mem: 30335
[02:54:01.765078] Test: Total time: 0:09:02 (0.2604 s / it)
[02:54:17.340713] * Acc@1 78.308 Acc@5 94.722 loss 0.860
[02:54:17.341266] Accuracy of the network on the 50000 test images: 78.3%
[02:54:17.341320] Max accuracy: 78.55%
[02:54:17.818928] log_dir: ./output_dir_qkformer
[02:54:26.611776] Epoch: [104]  [   0/6672]  eta: 16:09:22  lr: 0.000659  loss: 2.8624 (2.8624)  time: 8.7174  data: 5.6094  max mem: 30335
[03:19:52.393508] Epoch: [104]  [2000/6672]  eta: 0:59:41  lr: 0.000656  loss: 2.5482 (2.5359)  time: 0.7343  data: 0.0005  max mem: 30335
[03:44:46.153691] Epoch: [104]  [4000/6672]  eta: 0:33:41  lr: 0.000653  loss: 2.5226 (2.5419)  time: 0.7256  data: 0.0002  max mem: 30335
[04:09:52.513848] Epoch: [104]  [6000/6672]  eta: 0:08:27  lr: 0.000649  loss: 2.5325 (2.5478)  time: 0.7319  data: 0.0003  max mem: 30335
[04:18:16.441770] Epoch: [104]  [6671/6672]  eta: 0:00:00  lr: 0.000648  loss: 2.2794 (2.5488)  time: 0.7240  data: 0.0010  max mem: 30335
[04:18:17.054457] Epoch: [104] Total time: 1:23:59 (0.7553 s / it)
[04:18:17.132096] Averaged stats: lr: 0.000648  loss: 2.2794 (2.5430)
[04:18:19.260219] Test:  [   0/2084]  eta: 1:13:45  loss: 0.3118 (0.3118)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 2.1237  data: 1.7127  max mem: 30335
[04:20:27.732328] Test:  [ 500/2084]  eta: 0:06:52  loss: 0.8535 (0.6626)  acc1: 70.8333 (83.1005)  acc5: 95.8333 (96.8563)  time: 0.2561  data: 0.0002  max mem: 30335
[04:22:36.729212] Test:  [1000/2084]  eta: 0:04:41  loss: 0.7503 (0.7060)  acc1: 83.3333 (81.6101)  acc5: 95.8333 (96.4868)  time: 0.2566  data: 0.0002  max mem: 30335
[04:24:45.145868] Test:  [1500/2084]  eta: 0:02:30  loss: 0.5890 (0.7848)  acc1: 79.1667 (79.6275)  acc5: 95.8333 (95.5752)  time: 0.2565  data: 0.0002  max mem: 30335
[04:26:57.281782] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3218 (0.8316)  acc1: 87.5000 (78.5420)  acc5: 100.0000 (94.9650)  time: 0.2572  data: 0.0002  max mem: 30335
[04:27:18.581487] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4492 (0.8361)  acc1: 87.5000 (78.4500)  acc5: 100.0000 (94.9600)  time: 0.2559  data: 0.0001  max mem: 30335
[04:27:18.688286] Test: Total time: 0:09:01 (0.2599 s / it)
[04:27:34.039901] * Acc@1 78.448 Acc@5 94.955 loss 0.836
[04:27:34.040264] Accuracy of the network on the 50000 test images: 78.4%
[04:27:34.040299] Max accuracy: 78.55%
[04:27:34.373056] log_dir: ./output_dir_qkformer
[04:27:42.994856] Epoch: [105]  [   0/6672]  eta: 15:58:18  lr: 0.000648  loss: 2.0967 (2.0967)  time: 8.6178  data: 3.4900  max mem: 30335
[04:52:42.724874] Epoch: [105]  [2000/6672]  eta: 0:58:41  lr: 0.000645  loss: 2.6523 (2.5281)  time: 0.7268  data: 0.0002  max mem: 30335
[05:17:49.813201] Epoch: [105]  [4000/6672]  eta: 0:33:33  lr: 0.000642  loss: 2.5484 (2.5281)  time: 0.7297  data: 0.0003  max mem: 30335
[05:43:01.659473] Epoch: [105]  [6000/6672]  eta: 0:08:26  lr: 0.000639  loss: 2.3873 (2.5337)  time: 0.7282  data: 0.0003  max mem: 30335
[05:51:31.282123] Epoch: [105]  [6671/6672]  eta: 0:00:00  lr: 0.000637  loss: 2.4680 (2.5336)  time: 0.7250  data: 0.0011  max mem: 30335
[05:51:31.931968] Epoch: [105] Total time: 1:23:57 (0.7550 s / it)
[05:51:31.934251] Averaged stats: lr: 0.000637  loss: 2.4680 (2.5388)
[05:51:34.870290] Test:  [   0/2084]  eta: 1:41:48  loss: 0.2791 (0.2791)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 2.9312  data: 2.0996  max mem: 30335
[05:53:43.490852] Test:  [ 500/2084]  eta: 0:06:55  loss: 0.7956 (0.6606)  acc1: 66.6667 (82.8427)  acc5: 95.8333 (96.8480)  time: 0.2570  data: 0.0002  max mem: 30335
[05:55:52.752946] Test:  [1000/2084]  eta: 0:04:42  loss: 0.8707 (0.7076)  acc1: 79.1667 (81.6975)  acc5: 95.8333 (96.3037)  time: 0.2566  data: 0.0002  max mem: 30335
[05:58:01.323568] Test:  [1500/2084]  eta: 0:02:31  loss: 0.7046 (0.7880)  acc1: 83.3333 (79.8773)  acc5: 95.8333 (95.3253)  time: 0.2568  data: 0.0002  max mem: 30335
[06:00:10.577779] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3035 (0.8310)  acc1: 91.6667 (78.8106)  acc5: 100.0000 (94.8109)  time: 0.2559  data: 0.0002  max mem: 30335
[06:00:31.735751] Test:  [2083/2084]  eta: 0:00:00  loss: 0.7456 (0.8357)  acc1: 87.5000 (78.6600)  acc5: 100.0000 (94.8120)  time: 0.2491  data: 0.0001  max mem: 30335
[06:00:31.867396] Test: Total time: 0:08:59 (0.2591 s / it)
[06:00:47.479815] * Acc@1 78.647 Acc@5 94.823 loss 0.836
[06:00:47.480267] Accuracy of the network on the 50000 test images: 78.6%
[06:00:47.480305] Max accuracy: 78.65%
[06:00:47.673711] log_dir: ./output_dir_qkformer
[06:00:54.392345] Epoch: [106]  [   0/6672]  eta: 12:17:23  lr: 0.000637  loss: 1.8019 (1.8019)  time: 6.6312  data: 2.4009  max mem: 30335
[06:26:00.220714] Epoch: [106]  [2000/6672]  eta: 0:58:51  lr: 0.000634  loss: 2.4425 (2.5238)  time: 0.7309  data: 0.0002  max mem: 30335
[06:51:00.098477] Epoch: [106]  [4000/6672]  eta: 0:33:31  lr: 0.000631  loss: 2.4364 (2.5277)  time: 0.9380  data: 0.0002  max mem: 30335
[07:16:13.794517] Epoch: [106]  [6000/6672]  eta: 0:08:26  lr: 0.000628  loss: 2.3482 (2.5346)  time: 0.7558  data: 0.0003  max mem: 30335
[07:24:45.387192] Epoch: [106]  [6671/6672]  eta: 0:00:00  lr: 0.000627  loss: 2.6193 (2.5358)  time: 1.0195  data: 0.0058  max mem: 30335
[07:24:46.230341] Epoch: [106] Total time: 1:23:58 (0.7552 s / it)
[07:24:46.274906] Averaged stats: lr: 0.000627  loss: 2.6193 (2.5322)
[07:24:51.381871] Test:  [   0/2084]  eta: 2:57:11  loss: 0.4113 (0.4113)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.1015  data: 4.5121  max mem: 30335
[07:26:59.824476] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.7315 (0.6742)  acc1: 79.1667 (83.1005)  acc5: 95.8333 (96.6484)  time: 0.2568  data: 0.0002  max mem: 30335
[07:29:08.938335] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5812 (0.6949)  acc1: 87.5000 (82.4009)  acc5: 95.8333 (96.5951)  time: 0.2565  data: 0.0002  max mem: 30335
[07:31:17.474032] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7354 (0.7850)  acc1: 83.3333 (80.2410)  acc5: 95.8333 (95.6196)  time: 0.2574  data: 0.0002  max mem: 30335
[07:33:26.865788] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3930 (0.8332)  acc1: 91.6667 (78.8897)  acc5: 100.0000 (95.1066)  time: 0.2565  data: 0.0002  max mem: 30335
[07:33:48.029071] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4646 (0.8371)  acc1: 87.5000 (78.7960)  acc5: 95.8333 (95.0840)  time: 0.2501  data: 0.0001  max mem: 30335
[07:33:48.141859] Test: Total time: 0:09:01 (0.2600 s / it)
[07:34:03.817042] * Acc@1 78.809 Acc@5 95.077 loss 0.837
[07:34:03.817364] Accuracy of the network on the 50000 test images: 78.8%
[07:34:03.817402] Max accuracy: 78.81%
[07:34:04.271434] log_dir: ./output_dir_qkformer
[07:34:09.434058] Epoch: [107]  [   0/6672]  eta: 9:32:36  lr: 0.000627  loss: 2.8345 (2.8345)  time: 5.1494  data: 2.6672  max mem: 30335
[07:59:18.181532] Epoch: [107]  [2000/6672]  eta: 0:58:54  lr: 0.000623  loss: 2.5897 (2.5023)  time: 0.7265  data: 0.0003  max mem: 30335
[08:24:20.547348] Epoch: [107]  [4000/6672]  eta: 0:33:34  lr: 0.000620  loss: 2.5114 (2.5153)  time: 0.7301  data: 0.0003  max mem: 30335
[08:49:27.410900] Epoch: [107]  [6000/6672]  eta: 0:08:26  lr: 0.000617  loss: 2.3061 (2.5237)  time: 0.7495  data: 0.0008  max mem: 30335
[08:57:54.615479] Epoch: [107]  [6671/6672]  eta: 0:00:00  lr: 0.000616  loss: 2.5067 (2.5241)  time: 0.7240  data: 0.0006  max mem: 30335
[08:57:55.417893] Epoch: [107] Total time: 1:23:51 (0.7541 s / it)
[08:57:55.455418] Averaged stats: lr: 0.000616  loss: 2.5067 (2.5232)
[08:58:00.227387] Test:  [   0/2084]  eta: 2:45:31  loss: 0.3322 (0.3322)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7658  data: 4.0691  max mem: 30335
[09:00:08.881475] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.6375 (0.6751)  acc1: 83.3333 (83.0922)  acc5: 100.0000 (96.6816)  time: 0.2573  data: 0.0002  max mem: 30335
[09:02:18.183843] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6954 (0.7162)  acc1: 87.5000 (81.7516)  acc5: 91.6667 (96.2954)  time: 0.2563  data: 0.0002  max mem: 30335
[09:04:26.631885] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6354 (0.8031)  acc1: 83.3333 (79.6774)  acc5: 95.8333 (95.3392)  time: 0.2573  data: 0.0002  max mem: 30335
[09:06:35.945442] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4058 (0.8431)  acc1: 91.6667 (78.6877)  acc5: 100.0000 (94.8713)  time: 0.2570  data: 0.0002  max mem: 30335
[09:06:57.101728] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4756 (0.8436)  acc1: 91.6667 (78.6440)  acc5: 100.0000 (94.8800)  time: 0.2488  data: 0.0001  max mem: 30335
[09:06:57.244221] Test: Total time: 0:09:01 (0.2600 s / it)
[09:07:13.375033] * Acc@1 78.661 Acc@5 94.878 loss 0.844
[09:07:13.375417] Accuracy of the network on the 50000 test images: 78.7%
[09:07:13.375447] Max accuracy: 78.81%
[09:07:13.762236] log_dir: ./output_dir_qkformer
[09:07:17.937541] Epoch: [108]  [   0/6672]  eta: 7:40:00  lr: 0.000616  loss: 2.2921 (2.2921)  time: 4.1367  data: 2.3761  max mem: 30335
[09:32:32.477431] Epoch: [108]  [2000/6672]  eta: 0:59:05  lr: 0.000613  loss: 2.5355 (2.5017)  time: 0.7392  data: 0.0003  max mem: 30335
[09:57:37.273005] Epoch: [108]  [4000/6672]  eta: 0:33:38  lr: 0.000609  loss: 2.4718 (2.5118)  time: 0.7293  data: 0.0003  max mem: 30335
[10:22:39.771520] Epoch: [108]  [6000/6672]  eta: 0:08:26  lr: 0.000606  loss: 2.6039 (2.5163)  time: 0.7278  data: 0.0003  max mem: 30335
[10:31:02.502937] Epoch: [108]  [6671/6672]  eta: 0:00:00  lr: 0.000605  loss: 2.4327 (2.5181)  time: 0.7254  data: 0.0010  max mem: 30335
[10:31:03.208903] Epoch: [108] Total time: 1:23:49 (0.7538 s / it)
[10:31:03.275716] Averaged stats: lr: 0.000605  loss: 2.4327 (2.5156)
[10:31:07.480596] Test:  [   0/2084]  eta: 2:25:52  loss: 0.6055 (0.6055)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 4.1997  data: 3.6505  max mem: 30335
[10:33:15.932728] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.8962 (0.6779)  acc1: 70.8333 (82.5516)  acc5: 95.8333 (96.8896)  time: 0.2573  data: 0.0002  max mem: 30335
[10:35:24.850848] Test:  [1000/2084]  eta: 0:04:43  loss: 0.7951 (0.7196)  acc1: 83.3333 (81.5892)  acc5: 91.6667 (96.3786)  time: 0.2566  data: 0.0002  max mem: 30335
[10:37:34.626454] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6143 (0.8030)  acc1: 83.3333 (79.6969)  acc5: 95.8333 (95.4169)  time: 0.2560  data: 0.0002  max mem: 30335
[10:39:43.183893] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3625 (0.8429)  acc1: 91.6667 (78.8856)  acc5: 100.0000 (94.9317)  time: 0.2572  data: 0.0002  max mem: 30335
[10:40:04.449151] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4964 (0.8460)  acc1: 91.6667 (78.7780)  acc5: 100.0000 (94.9420)  time: 0.2525  data: 0.0002  max mem: 30335
[10:40:04.594153] Test: Total time: 0:09:01 (0.2597 s / it)
[10:40:19.891975] * Acc@1 78.767 Acc@5 94.942 loss 0.846
[10:40:19.892466] Accuracy of the network on the 50000 test images: 78.8%
[10:40:19.892501] Max accuracy: 78.81%
[10:40:20.229655] log_dir: ./output_dir_qkformer
[10:40:29.752913] Epoch: [109]  [   0/6672]  eta: 17:20:33  lr: 0.000605  loss: 2.1773 (2.1773)  time: 9.3576  data: 2.8986  max mem: 30335
[11:05:37.541667] Epoch: [109]  [2000/6672]  eta: 0:59:01  lr: 0.000602  loss: 2.3860 (2.4929)  time: 0.7291  data: 0.0002  max mem: 30335
[11:31:09.752864] Epoch: [109]  [4000/6672]  eta: 0:33:56  lr: 0.000599  loss: 2.4508 (2.4997)  time: 0.7236  data: 0.0002  max mem: 30335
[11:56:42.778397] Epoch: [109]  [6000/6672]  eta: 0:08:33  lr: 0.000595  loss: 2.5184 (2.5011)  time: 0.7434  data: 0.0005  max mem: 30335
[12:05:24.242799] Epoch: [109]  [6671/6672]  eta: 0:00:00  lr: 0.000594  loss: 2.4040 (2.5018)  time: 0.7284  data: 0.0006  max mem: 30335
[12:05:24.932168] Epoch: [109] Total time: 1:25:04 (0.7651 s / it)
[12:05:24.947209] Averaged stats: lr: 0.000594  loss: 2.4040 (2.5095)
[12:05:29.070972] Test:  [   0/2084]  eta: 2:23:03  loss: 0.2697 (0.2697)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.1190  data: 3.3481  max mem: 30335
[12:07:37.487955] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.7878 (0.6687)  acc1: 79.1667 (83.1005)  acc5: 95.8333 (96.8812)  time: 0.2565  data: 0.0002  max mem: 30335
[12:09:47.333410] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6579 (0.7013)  acc1: 87.5000 (81.8432)  acc5: 95.8333 (96.6658)  time: 0.2572  data: 0.0002  max mem: 30335
[12:11:56.214699] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5348 (0.7891)  acc1: 83.3333 (79.7663)  acc5: 95.8333 (95.5113)  time: 0.2572  data: 0.0002  max mem: 30335
[12:14:05.321568] Test:  [2000/2084]  eta: 0:00:21  loss: 0.4058 (0.8304)  acc1: 87.5000 (78.8418)  acc5: 100.0000 (95.0212)  time: 0.2557  data: 0.0002  max mem: 30335
[12:14:26.482375] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3892 (0.8299)  acc1: 91.6667 (78.8320)  acc5: 100.0000 (95.0260)  time: 0.2486  data: 0.0002  max mem: 30335
[12:14:26.618196] Test: Total time: 0:09:01 (0.2599 s / it)
[12:14:42.495525] * Acc@1 78.863 Acc@5 95.019 loss 0.829
[12:14:42.496061] Accuracy of the network on the 50000 test images: 78.9%
[12:14:42.496111] Max accuracy: 78.86%
[12:14:42.764534] log_dir: ./output_dir_qkformer
[12:14:49.483926] Epoch: [110]  [   0/6672]  eta: 12:19:02  lr: 0.000594  loss: 2.3489 (2.3489)  time: 6.6461  data: 3.2195  max mem: 30335
[12:40:23.664806] Epoch: [110]  [2000/6672]  eta: 0:59:56  lr: 0.000591  loss: 2.6596 (2.5034)  time: 0.8836  data: 0.0002  max mem: 30335
[13:05:56.020729] Epoch: [110]  [4000/6672]  eta: 0:34:12  lr: 0.000588  loss: 2.4504 (2.5049)  time: 0.7294  data: 0.0002  max mem: 30335
[13:31:13.061566] Epoch: [110]  [6000/6672]  eta: 0:08:33  lr: 0.000585  loss: 2.4405 (2.5080)  time: 0.8718  data: 0.0057  max mem: 30335
[13:39:42.123564] Epoch: [110]  [6671/6672]  eta: 0:00:00  lr: 0.000583  loss: 2.5657 (2.5088)  time: 0.7235  data: 0.0010  max mem: 30335
[13:39:42.760310] Epoch: [110] Total time: 1:24:59 (0.7644 s / it)
[13:39:43.013744] Averaged stats: lr: 0.000583  loss: 2.5657 (2.5022)
[13:39:47.212183] Test:  [   0/2084]  eta: 2:25:39  loss: 0.2455 (0.2455)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.1937  data: 3.4819  max mem: 30335
[13:41:56.796770] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.6812 (0.6736)  acc1: 79.1667 (83.1920)  acc5: 95.8333 (96.7482)  time: 0.2862  data: 0.0002  max mem: 30335
[13:44:07.676628] Test:  [1000/2084]  eta: 0:04:46  loss: 0.7083 (0.7068)  acc1: 83.3333 (82.0638)  acc5: 95.8333 (96.5243)  time: 0.2564  data: 0.0002  max mem: 30335
[13:46:16.334896] Test:  [1500/2084]  eta: 0:02:33  loss: 0.6966 (0.7784)  acc1: 83.3333 (80.1299)  acc5: 95.8333 (95.6612)  time: 0.2567  data: 0.0002  max mem: 30335
[13:48:25.308908] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3060 (0.8176)  acc1: 95.8333 (79.2416)  acc5: 100.0000 (95.1462)  time: 0.2568  data: 0.0002  max mem: 30335
[13:48:46.453978] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4883 (0.8253)  acc1: 83.3333 (79.0860)  acc5: 100.0000 (95.0780)  time: 0.2489  data: 0.0002  max mem: 30335
[13:48:46.577829] Test: Total time: 0:09:03 (0.2608 s / it)
[13:49:02.128180] * Acc@1 79.095 Acc@5 95.074 loss 0.825
[13:49:02.128503] Accuracy of the network on the 50000 test images: 79.1%
[13:49:02.128545] Max accuracy: 79.10%
[13:49:02.343204] log_dir: ./output_dir_qkformer
[13:49:14.779254] Epoch: [111]  [   0/6672]  eta: 22:46:55  lr: 0.000583  loss: 2.3438 (2.3438)  time: 12.2925  data: 2.3852  max mem: 30335
[14:14:17.504570] Epoch: [111]  [2000/6672]  eta: 0:58:56  lr: 0.000580  loss: 2.4866 (2.4881)  time: 0.7422  data: 0.0002  max mem: 30335
[14:39:44.039775] Epoch: [111]  [4000/6672]  eta: 0:33:50  lr: 0.000577  loss: 2.5438 (2.4964)  time: 0.7300  data: 0.0002  max mem: 30335
[15:04:59.395507] Epoch: [111]  [6000/6672]  eta: 0:08:30  lr: 0.000574  loss: 2.2992 (2.4993)  time: 0.8410  data: 0.0005  max mem: 30335
[15:13:32.616306] Epoch: [111]  [6671/6672]  eta: 0:00:00  lr: 0.000573  loss: 2.6848 (2.4994)  time: 0.7268  data: 0.0011  max mem: 30335
[15:13:33.338119] Epoch: [111] Total time: 1:24:30 (0.7600 s / it)
[15:13:33.508198] Averaged stats: lr: 0.000573  loss: 2.6848 (2.4958)
[15:13:40.145251] Test:  [   0/2084]  eta: 3:50:22  loss: 0.4252 (0.4252)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 6.6328  data: 5.9049  max mem: 30335
[15:15:48.660906] Test:  [ 500/2084]  eta: 0:07:07  loss: 0.8081 (0.6692)  acc1: 79.1667 (82.5931)  acc5: 95.8333 (96.9810)  time: 0.2563  data: 0.0002  max mem: 30335
[15:17:57.738176] Test:  [1000/2084]  eta: 0:04:46  loss: 0.6867 (0.6932)  acc1: 87.5000 (81.8390)  acc5: 95.8333 (96.6825)  time: 0.2570  data: 0.0002  max mem: 30335
[15:20:06.087674] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5898 (0.7688)  acc1: 83.3333 (80.1605)  acc5: 95.8333 (95.7612)  time: 0.2564  data: 0.0002  max mem: 30335
[15:22:16.319665] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3270 (0.8138)  acc1: 91.6667 (78.9897)  acc5: 100.0000 (95.1795)  time: 0.2574  data: 0.0002  max mem: 30335
[15:22:37.472290] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3864 (0.8194)  acc1: 87.5000 (78.8840)  acc5: 100.0000 (95.1340)  time: 0.2493  data: 0.0001  max mem: 30335
[15:22:37.592659] Test: Total time: 0:09:04 (0.2611 s / it)
[15:22:52.872416] * Acc@1 78.884 Acc@5 95.130 loss 0.819
[15:22:52.872676] Accuracy of the network on the 50000 test images: 78.9%
[15:22:52.872725] Max accuracy: 79.10%
[15:22:53.183884] log_dir: ./output_dir_qkformer
[15:23:03.500236] Epoch: [112]  [   0/6672]  eta: 18:45:47  lr: 0.000573  loss: 2.7879 (2.7879)  time: 10.1240  data: 2.8129  max mem: 30335
[15:48:27.676621] Epoch: [112]  [2000/6672]  eta: 0:59:41  lr: 0.000569  loss: 2.4366 (2.4821)  time: 0.7595  data: 0.0003  max mem: 30335
[16:14:06.265270] Epoch: [112]  [4000/6672]  eta: 0:34:11  lr: 0.000566  loss: 2.5034 (2.4833)  time: 0.7254  data: 0.0003  max mem: 30335
[16:39:15.415375] Epoch: [112]  [6000/6672]  eta: 0:08:32  lr: 0.000563  loss: 2.4198 (2.4895)  time: 0.7298  data: 0.0002  max mem: 30335
[16:47:36.428341] Epoch: [112]  [6671/6672]  eta: 0:00:00  lr: 0.000562  loss: 2.4109 (2.4915)  time: 0.7248  data: 0.0010  max mem: 30335
[16:47:37.036636] Epoch: [112] Total time: 1:24:43 (0.7620 s / it)
[16:47:37.100561] Averaged stats: lr: 0.000562  loss: 2.4109 (2.4896)
[16:47:42.522729] Test:  [   0/2084]  eta: 3:08:09  loss: 0.5943 (0.5943)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 5.4172  data: 4.7327  max mem: 30335
[16:49:51.285724] Test:  [ 500/2084]  eta: 0:07:04  loss: 0.8285 (0.6687)  acc1: 79.1667 (82.8094)  acc5: 95.8333 (96.8480)  time: 0.2575  data: 0.0002  max mem: 30335
[16:52:00.478795] Test:  [1000/2084]  eta: 0:04:45  loss: 0.6513 (0.6991)  acc1: 87.5000 (81.9306)  acc5: 95.8333 (96.6034)  time: 0.2572  data: 0.0002  max mem: 30335
[16:54:09.436831] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5450 (0.7760)  acc1: 83.3333 (80.2243)  acc5: 95.8333 (95.6446)  time: 0.2571  data: 0.0002  max mem: 30335
[16:56:19.925362] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2343 (0.8073)  acc1: 91.6667 (79.2479)  acc5: 100.0000 (95.2503)  time: 0.2569  data: 0.0002  max mem: 30335
[16:56:41.096851] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5331 (0.8106)  acc1: 87.5000 (79.1440)  acc5: 95.8333 (95.2380)  time: 0.2493  data: 0.0001  max mem: 30335
[16:56:41.222347] Test: Total time: 0:09:04 (0.2611 s / it)
[16:56:56.632399] * Acc@1 79.145 Acc@5 95.239 loss 0.811
[16:56:56.632690] Accuracy of the network on the 50000 test images: 79.1%
[16:56:56.632722] Max accuracy: 79.14%
[16:56:57.548522] log_dir: ./output_dir_qkformer
[16:57:07.765722] Epoch: [113]  [   0/6672]  eta: 18:55:36  lr: 0.000562  loss: 2.6110 (2.6110)  time: 10.2123  data: 2.3900  max mem: 30335
[17:22:37.979760] Epoch: [113]  [2000/6672]  eta: 0:59:55  lr: 0.000559  loss: 2.4848 (2.4640)  time: 0.7283  data: 0.0002  max mem: 30335
[17:47:48.025694] Epoch: [113]  [4000/6672]  eta: 0:33:56  lr: 0.000556  loss: 2.4364 (2.4726)  time: 0.7275  data: 0.0002  max mem: 30335
[18:13:08.008818] Epoch: [113]  [6000/6672]  eta: 0:08:31  lr: 0.000552  loss: 2.2888 (2.4688)  time: 0.7299  data: 0.0004  max mem: 30335
[18:21:46.449405] Epoch: [113]  [6671/6672]  eta: 0:00:00  lr: 0.000551  loss: 2.2914 (2.4728)  time: 0.7269  data: 0.0013  max mem: 30335
[18:21:47.377293] Epoch: [113] Total time: 1:24:49 (0.7629 s / it)
[18:21:47.428221] Averaged stats: lr: 0.000551  loss: 2.2914 (2.4808)
[18:21:52.894643] Test:  [   0/2084]  eta: 3:09:42  loss: 0.2552 (0.2552)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.4618  data: 4.6613  max mem: 30335
[18:24:01.266827] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.7455 (0.6662)  acc1: 79.1667 (82.6846)  acc5: 95.8333 (96.8896)  time: 0.2572  data: 0.0002  max mem: 30335
[18:26:10.445123] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7249 (0.6905)  acc1: 83.3333 (81.9306)  acc5: 95.8333 (96.7324)  time: 0.2566  data: 0.0002  max mem: 30335
[18:28:18.860305] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7787 (0.7809)  acc1: 79.1667 (79.7663)  acc5: 95.8333 (95.6696)  time: 0.2562  data: 0.0002  max mem: 30335
[18:30:31.211875] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3165 (0.8154)  acc1: 91.6667 (78.9813)  acc5: 100.0000 (95.2295)  time: 0.2561  data: 0.0002  max mem: 30335
[18:30:52.376047] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3902 (0.8181)  acc1: 87.5000 (78.9500)  acc5: 100.0000 (95.2200)  time: 0.2491  data: 0.0001  max mem: 30335
[18:30:52.509505] Test: Total time: 0:09:05 (0.2616 s / it)
[18:31:08.102973] * Acc@1 78.932 Acc@5 95.232 loss 0.818
[18:31:08.103230] Accuracy of the network on the 50000 test images: 78.9%
[18:31:08.103271] Max accuracy: 79.14%
[18:31:08.554749] log_dir: ./output_dir_qkformer
[18:31:20.934571] Epoch: [114]  [   0/6672]  eta: 22:44:15  lr: 0.000551  loss: 2.6117 (2.6117)  time: 12.2685  data: 3.2152  max mem: 30335
[18:56:47.937361] Epoch: [114]  [2000/6672]  eta: 0:59:53  lr: 0.000548  loss: 2.4413 (2.4533)  time: 0.7373  data: 0.0003  max mem: 30335
[19:22:13.442133] Epoch: [114]  [4000/6672]  eta: 0:34:06  lr: 0.000545  loss: 2.4229 (2.4646)  time: 0.8045  data: 0.0003  max mem: 30335
[19:47:31.825565] Epoch: [114]  [6000/6672]  eta: 0:08:33  lr: 0.000542  loss: 2.2848 (2.4657)  time: 0.7946  data: 0.0003  max mem: 30335
[19:56:15.770538] Epoch: [114]  [6671/6672]  eta: 0:00:00  lr: 0.000541  loss: 2.5222 (2.4678)  time: 0.7247  data: 0.0011  max mem: 30335
[19:56:16.593362] Epoch: [114] Total time: 1:25:08 (0.7656 s / it)
[19:56:16.597000] Averaged stats: lr: 0.000541  loss: 2.5222 (2.4738)
[19:56:21.174523] Test:  [   0/2084]  eta: 2:38:48  loss: 0.5377 (0.5377)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.5724  data: 3.7498  max mem: 30335
[19:58:30.321943] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.7053 (0.6463)  acc1: 79.1667 (83.3666)  acc5: 100.0000 (97.1640)  time: 0.2575  data: 0.0002  max mem: 30335
[20:00:39.413027] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4998 (0.6830)  acc1: 83.3333 (82.3551)  acc5: 95.8333 (96.7657)  time: 0.2574  data: 0.0002  max mem: 30335
[20:02:48.298760] Test:  [1500/2084]  eta: 0:02:32  loss: 0.7058 (0.7701)  acc1: 83.3333 (80.1854)  acc5: 95.8333 (95.8000)  time: 0.2564  data: 0.0002  max mem: 30335
[20:04:57.770739] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3568 (0.8057)  acc1: 91.6667 (79.4207)  acc5: 100.0000 (95.3232)  time: 0.2573  data: 0.0002  max mem: 30335
[20:05:18.980491] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4298 (0.8083)  acc1: 91.6667 (79.3560)  acc5: 100.0000 (95.3180)  time: 0.2513  data: 0.0001  max mem: 30335
[20:05:19.096577] Test: Total time: 0:09:02 (0.2603 s / it)
[20:05:34.573853] * Acc@1 79.370 Acc@5 95.316 loss 0.808
[20:05:34.574155] Accuracy of the network on the 50000 test images: 79.4%
[20:05:34.574186] Max accuracy: 79.37%
[20:05:35.098179] log_dir: ./output_dir_qkformer
[20:05:40.603124] Epoch: [115]  [   0/6672]  eta: 10:11:59  lr: 0.000541  loss: 2.2365 (2.2365)  time: 5.5035  data: 2.7380  max mem: 30335
[20:30:46.944267] Epoch: [115]  [2000/6672]  eta: 0:58:49  lr: 0.000537  loss: 2.3944 (2.4615)  time: 0.7262  data: 0.0002  max mem: 30335
[20:55:57.079856] Epoch: [115]  [4000/6672]  eta: 0:33:37  lr: 0.000534  loss: 2.3552 (2.4611)  time: 0.7260  data: 0.0002  max mem: 30335
[21:21:09.724962] Epoch: [115]  [6000/6672]  eta: 0:08:27  lr: 0.000531  loss: 2.3974 (2.4646)  time: 0.7316  data: 0.0002  max mem: 30335
[21:29:37.044764] Epoch: [115]  [6671/6672]  eta: 0:00:00  lr: 0.000530  loss: 2.3543 (2.4644)  time: 0.7272  data: 0.0011  max mem: 30335
[21:29:37.709466] Epoch: [115] Total time: 1:24:02 (0.7558 s / it)
[21:29:37.784370] Averaged stats: lr: 0.000530  loss: 2.3543 (2.4661)
[21:29:43.372307] Test:  [   0/2084]  eta: 3:13:50  loss: 0.5486 (0.5486)  acc1: 87.5000 (87.5000)  acc5: 95.8333 (95.8333)  time: 5.5808  data: 4.7816  max mem: 30335
[21:31:51.667998] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.6063 (0.6438)  acc1: 83.3333 (83.7409)  acc5: 100.0000 (97.1141)  time: 0.2566  data: 0.0002  max mem: 30335
[21:34:00.550151] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7121 (0.6772)  acc1: 87.5000 (82.7173)  acc5: 91.6667 (96.8157)  time: 0.2565  data: 0.0002  max mem: 30335
[21:36:09.701874] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6827 (0.7663)  acc1: 83.3333 (80.6296)  acc5: 95.8333 (95.8250)  time: 0.2574  data: 0.0002  max mem: 30335
[21:38:18.716398] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2963 (0.8022)  acc1: 95.8333 (79.7060)  acc5: 100.0000 (95.3211)  time: 0.2564  data: 0.0002  max mem: 30335
[21:38:39.898653] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4620 (0.8041)  acc1: 87.5000 (79.6500)  acc5: 100.0000 (95.3240)  time: 0.2495  data: 0.0001  max mem: 30335
[21:38:39.995735] Test: Total time: 0:09:02 (0.2602 s / it)
[21:38:55.773830] * Acc@1 79.645 Acc@5 95.330 loss 0.804
[21:38:55.774116] Accuracy of the network on the 50000 test images: 79.6%
[21:38:55.774175] Max accuracy: 79.64%
[21:38:56.004369] log_dir: ./output_dir_qkformer
[21:39:04.391700] Epoch: [116]  [   0/6672]  eta: 15:03:09  lr: 0.000530  loss: 2.3256 (2.3256)  time: 8.1219  data: 2.5670  max mem: 30335
[22:04:07.828988] Epoch: [116]  [2000/6672]  eta: 0:58:48  lr: 0.000527  loss: 2.6376 (2.4480)  time: 0.7309  data: 0.0003  max mem: 30335
[22:29:20.029921] Epoch: [116]  [4000/6672]  eta: 0:33:39  lr: 0.000524  loss: 2.3498 (2.4540)  time: 0.9138  data: 0.0003  max mem: 30335
[22:54:34.323520] Epoch: [116]  [6000/6672]  eta: 0:08:28  lr: 0.000520  loss: 2.5190 (2.4540)  time: 0.7366  data: 0.0003  max mem: 30335
[23:02:57.671912] Epoch: [116]  [6671/6672]  eta: 0:00:00  lr: 0.000519  loss: 2.5411 (2.4565)  time: 0.7241  data: 0.0010  max mem: 30335
[23:02:58.273928] Epoch: [116] Total time: 1:24:02 (0.7557 s / it)
[23:02:58.492040] Averaged stats: lr: 0.000519  loss: 2.5411 (2.4576)
[23:03:03.987506] Test:  [   0/2084]  eta: 3:10:41  loss: 0.3226 (0.3226)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 5.4904  data: 4.7332  max mem: 30335
[23:05:13.922401] Test:  [ 500/2084]  eta: 0:07:08  loss: 0.7151 (0.6471)  acc1: 75.0000 (83.6577)  acc5: 95.8333 (96.9228)  time: 0.2558  data: 0.0002  max mem: 30335
[23:07:22.532536] Test:  [1000/2084]  eta: 0:04:45  loss: 0.6670 (0.6831)  acc1: 87.5000 (82.4759)  acc5: 91.6667 (96.7574)  time: 0.2569  data: 0.0002  max mem: 30335
[23:09:31.419678] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6000 (0.7690)  acc1: 83.3333 (80.4047)  acc5: 95.8333 (95.8084)  time: 0.2578  data: 0.0002  max mem: 30335
[23:11:41.569514] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2724 (0.8058)  acc1: 91.6667 (79.5227)  acc5: 100.0000 (95.3523)  time: 0.2897  data: 0.0135  max mem: 30335
[23:12:02.726024] Test:  [2083/2084]  eta: 0:00:00  loss: 0.6269 (0.8093)  acc1: 83.3333 (79.4100)  acc5: 100.0000 (95.3380)  time: 0.2490  data: 0.0001  max mem: 30335
[23:12:02.837550] Test: Total time: 0:09:04 (0.2612 s / it)
[23:12:18.462983] * Acc@1 79.417 Acc@5 95.333 loss 0.809
[23:12:18.463353] Accuracy of the network on the 50000 test images: 79.4%
[23:12:18.463391] Max accuracy: 79.64%
[23:12:19.155266] log_dir: ./output_dir_qkformer
[23:12:29.252734] Epoch: [117]  [   0/6672]  eta: 18:32:49  lr: 0.000519  loss: 2.0454 (2.0454)  time: 10.0074  data: 2.7540  max mem: 30335
[23:37:27.692578] Epoch: [117]  [2000/6672]  eta: 0:58:41  lr: 0.000516  loss: 2.1492 (2.4294)  time: 0.7275  data: 0.0003  max mem: 30335
[00:02:58.533918] Epoch: [117]  [4000/6672]  eta: 0:33:49  lr: 0.000513  loss: 2.4534 (2.4442)  time: 0.7230  data: 0.0002  max mem: 30335
[00:27:43.629256] Epoch: [117]  [6000/6672]  eta: 0:08:26  lr: 0.000510  loss: 2.4879 (2.4446)  time: 0.7325  data: 0.0002  max mem: 30335
[00:36:16.252502] Epoch: [117]  [6671/6672]  eta: 0:00:00  lr: 0.000509  loss: 2.4179 (2.4467)  time: 0.7434  data: 0.0046  max mem: 30335
[00:36:16.974931] Epoch: [117] Total time: 1:23:57 (0.7551 s / it)
[00:36:17.039089] Averaged stats: lr: 0.000509  loss: 2.4179 (2.4502)
[00:36:22.200261] Test:  [   0/2084]  eta: 2:59:06  loss: 0.3807 (0.3807)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.1565  data: 4.4304  max mem: 30335
[00:38:30.885854] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.7186 (0.6199)  acc1: 83.3333 (84.0070)  acc5: 100.0000 (97.2971)  time: 0.2572  data: 0.0002  max mem: 30335
[00:40:39.289819] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5870 (0.6631)  acc1: 87.5000 (82.7506)  acc5: 95.8333 (96.9197)  time: 0.2563  data: 0.0002  max mem: 30335
[00:42:47.976329] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6221 (0.7410)  acc1: 87.5000 (80.8045)  acc5: 95.8333 (96.0388)  time: 0.2570  data: 0.0002  max mem: 30335
[00:44:56.542605] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2632 (0.7856)  acc1: 91.6667 (79.7101)  acc5: 100.0000 (95.5043)  time: 0.2575  data: 0.0002  max mem: 30335
[00:45:17.686031] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5206 (0.7879)  acc1: 87.5000 (79.6160)  acc5: 95.8333 (95.4840)  time: 0.2489  data: 0.0001  max mem: 30335
[00:45:17.806785] Test: Total time: 0:09:00 (0.2595 s / it)
[00:45:33.341396] * Acc@1 79.646 Acc@5 95.481 loss 0.787
[00:45:33.341657] Accuracy of the network on the 50000 test images: 79.6%
[00:45:33.341705] Max accuracy: 79.65%
[00:45:33.574807] log_dir: ./output_dir_qkformer
[00:45:44.133683] Epoch: [118]  [   0/6672]  eta: 19:12:22  lr: 0.000509  loss: 2.8658 (2.8658)  time: 10.3630  data: 2.0976  max mem: 30335
[01:11:13.671098] Epoch: [118]  [2000/6672]  eta: 0:59:54  lr: 0.000506  loss: 2.4547 (2.4363)  time: 0.7330  data: 0.0003  max mem: 30335
[01:36:21.200949] Epoch: [118]  [4000/6672]  eta: 0:33:54  lr: 0.000503  loss: 2.3801 (2.4430)  time: 0.7272  data: 0.0002  max mem: 30335
[02:01:19.954854] Epoch: [118]  [6000/6672]  eta: 0:08:29  lr: 0.000499  loss: 2.4874 (2.4475)  time: 0.8725  data: 0.0067  max mem: 30335
[02:09:42.407117] Epoch: [118]  [6671/6672]  eta: 0:00:00  lr: 0.000498  loss: 2.2085 (2.4489)  time: 0.7241  data: 0.0010  max mem: 30335
[02:09:43.172157] Epoch: [118] Total time: 1:24:09 (0.7568 s / it)
[02:09:43.210649] Averaged stats: lr: 0.000498  loss: 2.2085 (2.4438)
[02:09:47.813397] Test:  [   0/2084]  eta: 2:39:39  loss: 0.2465 (0.2465)  acc1: 91.6667 (91.6667)  acc5: 100.0000 (100.0000)  time: 4.5969  data: 3.9749  max mem: 30335
[02:11:57.140482] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.6609 (0.6464)  acc1: 79.1667 (83.8490)  acc5: 100.0000 (97.1058)  time: 0.2571  data: 0.0003  max mem: 30335
[02:14:06.642556] Test:  [1000/2084]  eta: 0:04:45  loss: 0.6152 (0.6851)  acc1: 83.3333 (82.6215)  acc5: 95.8333 (96.7158)  time: 0.2569  data: 0.0002  max mem: 30335
[02:16:15.017101] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6760 (0.7690)  acc1: 83.3333 (80.3492)  acc5: 95.8333 (95.7972)  time: 0.2563  data: 0.0002  max mem: 30335
[02:18:26.109255] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2849 (0.8054)  acc1: 91.6667 (79.3791)  acc5: 100.0000 (95.4294)  time: 0.2562  data: 0.0002  max mem: 30335
[02:18:47.259953] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5130 (0.8076)  acc1: 87.5000 (79.3340)  acc5: 100.0000 (95.4420)  time: 0.2488  data: 0.0001  max mem: 30335
[02:18:47.384734] Test: Total time: 0:09:04 (0.2611 s / it)
[02:19:02.644810] * Acc@1 79.346 Acc@5 95.430 loss 0.808
[02:19:02.645112] Accuracy of the network on the 50000 test images: 79.3%
[02:19:02.645172] Max accuracy: 79.65%
[02:19:02.875405] log_dir: ./output_dir_qkformer
[02:19:07.266720] Epoch: [119]  [   0/6672]  eta: 8:08:13  lr: 0.000498  loss: 2.9813 (2.9813)  time: 4.3905  data: 2.0989  max mem: 30335
[02:44:18.652671] Epoch: [119]  [2000/6672]  eta: 0:58:58  lr: 0.000495  loss: 2.3819 (2.4191)  time: 0.7267  data: 0.0003  max mem: 30335
[03:09:16.047805] Epoch: [119]  [4000/6672]  eta: 0:33:31  lr: 0.000492  loss: 2.5452 (2.4324)  time: 0.7266  data: 0.0003  max mem: 30335
[03:34:01.543592] Epoch: [119]  [6000/6672]  eta: 0:08:23  lr: 0.000489  loss: 2.1928 (2.4400)  time: 0.7259  data: 0.0003  max mem: 30335
[03:42:28.557841] Epoch: [119]  [6671/6672]  eta: 0:00:00  lr: 0.000488  loss: 2.3022 (2.4438)  time: 0.7244  data: 0.0011  max mem: 30335
[03:42:29.333156] Epoch: [119] Total time: 1:23:26 (0.7504 s / it)
[03:42:29.379824] Averaged stats: lr: 0.000488  loss: 2.3022 (2.4383)
[03:42:34.244055] Test:  [   0/2084]  eta: 2:48:45  loss: 0.3527 (0.3527)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.8589  data: 4.3017  max mem: 30335
[03:44:43.066641] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.6774 (0.6171)  acc1: 83.3333 (83.7908)  acc5: 100.0000 (97.2389)  time: 0.2578  data: 0.0002  max mem: 30335
[03:46:51.997281] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5291 (0.6685)  acc1: 83.3333 (82.2219)  acc5: 95.8333 (96.8198)  time: 0.2567  data: 0.0002  max mem: 30335
[03:49:02.551635] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6278 (0.7463)  acc1: 83.3333 (80.4353)  acc5: 95.8333 (95.8555)  time: 0.2577  data: 0.0002  max mem: 30335
[03:51:11.246068] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2759 (0.7779)  acc1: 91.6667 (79.7018)  acc5: 100.0000 (95.4523)  time: 0.2570  data: 0.0002  max mem: 30335
[03:51:32.457182] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3379 (0.7832)  acc1: 91.6667 (79.5860)  acc5: 100.0000 (95.4500)  time: 0.2500  data: 0.0002  max mem: 30335
[03:51:32.595280] Test: Total time: 0:09:03 (0.2607 s / it)
[03:51:47.702752] * Acc@1 79.595 Acc@5 95.448 loss 0.783
[03:51:47.703339] Accuracy of the network on the 50000 test images: 79.6%
[03:51:47.703394] Max accuracy: 79.65%
[03:51:47.890396] log_dir: ./output_dir_qkformer
[03:51:59.621004] Epoch: [120]  [   0/6672]  eta: 21:39:04  lr: 0.000488  loss: 2.8621 (2.8621)  time: 11.6824  data: 2.3265  max mem: 30335
[04:16:58.090214] Epoch: [120]  [2000/6672]  eta: 0:58:45  lr: 0.000485  loss: 2.3176 (2.4164)  time: 0.9025  data: 0.0002  max mem: 30335
[04:42:06.654296] Epoch: [120]  [4000/6672]  eta: 0:33:35  lr: 0.000482  loss: 2.3631 (2.4259)  time: 0.7283  data: 0.0002  max mem: 30335
[05:07:16.859755] Epoch: [120]  [6000/6672]  eta: 0:08:27  lr: 0.000478  loss: 2.3722 (2.4258)  time: 0.7315  data: 0.0002  max mem: 30335
[05:15:31.129350] Epoch: [120]  [6671/6672]  eta: 0:00:00  lr: 0.000477  loss: 2.3791 (2.4280)  time: 0.7247  data: 0.0010  max mem: 30335
[05:15:31.783146] Epoch: [120] Total time: 1:23:43 (0.7530 s / it)
[05:15:31.832380] Averaged stats: lr: 0.000477  loss: 2.3791 (2.4274)
[05:15:35.118000] Test:  [   0/2084]  eta: 1:53:58  loss: 0.3051 (0.3051)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.2812  data: 2.6380  max mem: 30335
[05:17:44.400235] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.6558 (0.6159)  acc1: 79.1667 (84.5808)  acc5: 100.0000 (97.0559)  time: 0.2575  data: 0.0002  max mem: 30335
[05:19:53.623510] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5829 (0.6685)  acc1: 83.3333 (82.9462)  acc5: 95.8333 (96.7033)  time: 0.2565  data: 0.0002  max mem: 30335
[05:22:03.251996] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6993 (0.7445)  acc1: 79.1667 (81.1459)  acc5: 95.8333 (95.9222)  time: 0.2569  data: 0.0002  max mem: 30335
[05:24:11.849277] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3038 (0.7825)  acc1: 91.6667 (80.2224)  acc5: 100.0000 (95.4939)  time: 0.2569  data: 0.0002  max mem: 30335
[05:24:33.053844] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3930 (0.7849)  acc1: 87.5000 (80.1740)  acc5: 100.0000 (95.5020)  time: 0.2506  data: 0.0001  max mem: 30335
[05:24:33.164373] Test: Total time: 0:09:01 (0.2598 s / it)
[05:24:48.332915] * Acc@1 80.169 Acc@5 95.510 loss 0.785
[05:24:48.333172] Accuracy of the network on the 50000 test images: 80.2%
[05:24:48.333207] Max accuracy: 80.17%
[05:24:48.543198] log_dir: ./output_dir_qkformer
[05:24:57.191390] Epoch: [121]  [   0/6672]  eta: 15:52:13  lr: 0.000477  loss: 2.3114 (2.3114)  time: 8.5632  data: 2.2495  max mem: 30335
[05:50:10.799797] Epoch: [121]  [2000/6672]  eta: 0:59:13  lr: 0.000474  loss: 2.5011 (2.4179)  time: 0.7285  data: 0.0002  max mem: 30335
[06:15:16.854889] Epoch: [121]  [4000/6672]  eta: 0:33:42  lr: 0.000471  loss: 2.3146 (2.4197)  time: 0.7244  data: 0.0003  max mem: 30335
[06:40:08.978254] Epoch: [121]  [6000/6672]  eta: 0:08:26  lr: 0.000468  loss: 2.4075 (2.4203)  time: 0.7252  data: 0.0002  max mem: 30335
[06:48:30.036494] Epoch: [121]  [6671/6672]  eta: 0:00:00  lr: 0.000467  loss: 2.2000 (2.4215)  time: 0.7231  data: 0.0011  max mem: 30335
[06:48:30.732981] Epoch: [121] Total time: 1:23:42 (0.7527 s / it)
[06:48:30.822854] Averaged stats: lr: 0.000467  loss: 2.2000 (2.4226)
[06:48:37.210328] Test:  [   0/2084]  eta: 3:41:39  loss: 0.4089 (0.4089)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 6.3819  data: 5.3480  max mem: 30335
[06:50:45.887713] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.6027 (0.6243)  acc1: 75.0000 (84.3979)  acc5: 100.0000 (97.1806)  time: 0.2571  data: 0.0002  max mem: 30335
[06:52:53.992443] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6835 (0.6629)  acc1: 83.3333 (82.9254)  acc5: 95.8333 (96.9073)  time: 0.2562  data: 0.0002  max mem: 30335
[06:55:02.305712] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5670 (0.7400)  acc1: 87.5000 (81.2098)  acc5: 95.8333 (96.1165)  time: 0.2560  data: 0.0002  max mem: 30335
[06:57:13.271949] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2797 (0.7785)  acc1: 91.6667 (80.3265)  acc5: 100.0000 (95.7605)  time: 0.2567  data: 0.0002  max mem: 30335
[06:57:34.475178] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3916 (0.7814)  acc1: 91.6667 (80.2340)  acc5: 100.0000 (95.7580)  time: 0.2501  data: 0.0002  max mem: 30335
[06:57:34.603498] Test: Total time: 0:09:03 (0.2609 s / it)
[06:57:50.171441] * Acc@1 80.235 Acc@5 95.739 loss 0.781
[06:57:50.171887] Accuracy of the network on the 50000 test images: 80.2%
[06:57:50.171922] Max accuracy: 80.23%
[06:57:50.285842] log_dir: ./output_dir_qkformer
[06:57:55.603394] Epoch: [122]  [   0/6672]  eta: 9:51:12  lr: 0.000467  loss: 2.1835 (2.1835)  time: 5.3167  data: 1.9881  max mem: 30335
[07:22:51.652726] Epoch: [122]  [2000/6672]  eta: 0:58:24  lr: 0.000464  loss: 2.2635 (2.3975)  time: 0.7366  data: 0.0003  max mem: 30335
[07:48:01.895948] Epoch: [122]  [4000/6672]  eta: 0:33:30  lr: 0.000461  loss: 2.3205 (2.4031)  time: 0.9810  data: 0.0004  max mem: 30335
[08:12:51.037083] Epoch: [122]  [6000/6672]  eta: 0:08:23  lr: 0.000458  loss: 2.3754 (2.4082)  time: 0.7302  data: 0.0003  max mem: 30335
[08:21:10.785952] Epoch: [122]  [6671/6672]  eta: 0:00:00  lr: 0.000457  loss: 2.3945 (2.4115)  time: 0.8438  data: 0.0009  max mem: 30335
[08:21:11.736733] Epoch: [122] Total time: 1:23:21 (0.7496 s / it)
[08:21:11.775811] Averaged stats: lr: 0.000457  loss: 2.3945 (2.4129)
[08:21:16.775227] Test:  [   0/2084]  eta: 2:53:29  loss: 0.2644 (0.2644)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.9948  data: 4.3314  max mem: 30335
[08:23:25.166066] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5402 (0.6075)  acc1: 83.3333 (84.4977)  acc5: 100.0000 (97.3636)  time: 0.2571  data: 0.0002  max mem: 30335
[08:25:33.636407] Test:  [1000/2084]  eta: 0:04:43  loss: 0.6020 (0.6548)  acc1: 79.1667 (83.0711)  acc5: 95.8333 (97.0155)  time: 0.2572  data: 0.0002  max mem: 30335
[08:27:41.983598] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5387 (0.7378)  acc1: 87.5000 (81.1237)  acc5: 95.8333 (96.0665)  time: 0.2564  data: 0.0002  max mem: 30335
[08:29:51.816616] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2581 (0.7835)  acc1: 95.8333 (80.0850)  acc5: 100.0000 (95.5189)  time: 0.2567  data: 0.0002  max mem: 30335
[08:30:12.989384] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3130 (0.7876)  acc1: 87.5000 (79.9620)  acc5: 100.0000 (95.4880)  time: 0.2491  data: 0.0001  max mem: 30335
[08:30:13.113236] Test: Total time: 0:09:01 (0.2598 s / it)
[08:30:28.279059] * Acc@1 79.964 Acc@5 95.484 loss 0.788
[08:30:28.279525] Accuracy of the network on the 50000 test images: 80.0%
[08:30:28.279567] Max accuracy: 80.23%
[08:30:29.053274] log_dir: ./output_dir_qkformer
[08:30:37.316704] Epoch: [123]  [   0/6672]  eta: 15:18:21  lr: 0.000457  loss: 2.7570 (2.7570)  time: 8.2586  data: 2.2182  max mem: 30335
[08:55:40.762685] Epoch: [123]  [2000/6672]  eta: 0:58:48  lr: 0.000454  loss: 2.5115 (2.3971)  time: 0.7322  data: 0.0002  max mem: 30335
[09:20:27.122649] Epoch: [123]  [4000/6672]  eta: 0:33:21  lr: 0.000451  loss: 2.4337 (2.4028)  time: 0.7246  data: 0.0003  max mem: 30335
[09:45:06.734715] Epoch: [123]  [6000/6672]  eta: 0:08:21  lr: 0.000448  loss: 2.1975 (2.4086)  time: 0.7293  data: 0.0003  max mem: 30335
[09:53:29.254346] Epoch: [123]  [6671/6672]  eta: 0:00:00  lr: 0.000447  loss: 2.3554 (2.4097)  time: 0.7271  data: 0.0010  max mem: 30335
[09:53:30.032116] Epoch: [123] Total time: 1:23:00 (0.7465 s / it)
[09:53:30.081653] Averaged stats: lr: 0.000447  loss: 2.3554 (2.4051)
[09:53:35.146164] Test:  [   0/2084]  eta: 2:55:43  loss: 0.3020 (0.3020)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.0591  data: 4.2121  max mem: 30335
[09:55:44.875789] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.6895 (0.5923)  acc1: 79.1667 (84.4977)  acc5: 100.0000 (97.3220)  time: 0.2566  data: 0.0002  max mem: 30335
[09:57:53.358580] Test:  [1000/2084]  eta: 0:04:45  loss: 0.7564 (0.6428)  acc1: 83.3333 (83.1876)  acc5: 91.6667 (96.9655)  time: 0.2549  data: 0.0002  max mem: 30335
[10:00:01.313613] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5773 (0.7174)  acc1: 83.3333 (81.5623)  acc5: 95.8333 (96.1942)  time: 0.2550  data: 0.0002  max mem: 30335
[10:02:09.437470] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2927 (0.7543)  acc1: 91.6667 (80.5764)  acc5: 100.0000 (95.7542)  time: 0.2567  data: 0.0002  max mem: 30335
[10:02:30.557177] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5283 (0.7578)  acc1: 87.5000 (80.4940)  acc5: 100.0000 (95.7300)  time: 0.2491  data: 0.0002  max mem: 30335
[10:02:30.699736] Test: Total time: 0:09:00 (0.2594 s / it)
[10:02:46.505508] * Acc@1 80.476 Acc@5 95.727 loss 0.758
[10:02:46.505808] Accuracy of the network on the 50000 test images: 80.5%
[10:02:46.505869] Max accuracy: 80.48%
[10:02:46.908439] log_dir: ./output_dir_qkformer
[10:02:53.642126] Epoch: [124]  [   0/6672]  eta: 12:13:16  lr: 0.000447  loss: 2.1067 (2.1067)  time: 6.5943  data: 2.2752  max mem: 30335
[10:27:43.038918] Epoch: [124]  [2000/6672]  eta: 0:58:12  lr: 0.000443  loss: 2.4111 (2.3781)  time: 0.7283  data: 0.0003  max mem: 30335
[10:52:46.595348] Epoch: [124]  [4000/6672]  eta: 0:33:22  lr: 0.000440  loss: 2.1513 (2.3866)  time: 0.7258  data: 0.0002  max mem: 30335
[11:17:53.417390] Epoch: [124]  [6000/6672]  eta: 0:08:24  lr: 0.000437  loss: 2.2531 (2.3913)  time: 0.7268  data: 0.0002  max mem: 30335
[11:26:08.863879] Epoch: [124]  [6671/6672]  eta: 0:00:00  lr: 0.000436  loss: 2.3079 (2.3928)  time: 0.7252  data: 0.0010  max mem: 30335
[11:26:09.504435] Epoch: [124] Total time: 1:23:22 (0.7498 s / it)
[11:26:09.577511] Averaged stats: lr: 0.000436  loss: 2.3079 (2.3949)
[11:26:13.252399] Test:  [   0/2084]  eta: 2:07:23  loss: 0.4666 (0.4666)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.6676  data: 2.9298  max mem: 30335
[11:28:21.592283] Test:  [ 500/2084]  eta: 0:06:57  loss: 0.6145 (0.5957)  acc1: 83.3333 (84.6141)  acc5: 100.0000 (97.4717)  time: 0.2566  data: 0.0002  max mem: 30335
[11:30:29.904718] Test:  [1000/2084]  eta: 0:04:41  loss: 0.6658 (0.6435)  acc1: 83.3333 (83.4249)  acc5: 95.8333 (97.1528)  time: 0.2565  data: 0.0002  max mem: 30335
[11:32:38.847219] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5084 (0.7192)  acc1: 87.5000 (81.5817)  acc5: 95.8333 (96.2858)  time: 0.2561  data: 0.0002  max mem: 30335
[11:34:47.046053] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3026 (0.7627)  acc1: 91.6667 (80.5368)  acc5: 100.0000 (95.7729)  time: 0.2564  data: 0.0002  max mem: 30335
[11:35:08.299292] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2800 (0.7644)  acc1: 91.6667 (80.4800)  acc5: 100.0000 (95.7820)  time: 0.2538  data: 0.0002  max mem: 30335
[11:35:08.411512] Test: Total time: 0:08:58 (0.2586 s / it)
[11:35:23.964327] * Acc@1 80.487 Acc@5 95.771 loss 0.764
[11:35:23.964681] Accuracy of the network on the 50000 test images: 80.5%
[11:35:23.964730] Max accuracy: 80.49%
[11:35:24.027238] log_dir: ./output_dir_qkformer
[11:35:32.106228] Epoch: [125]  [   0/6672]  eta: 14:52:23  lr: 0.000436  loss: 2.4668 (2.4668)  time: 8.0252  data: 2.6533  max mem: 30335
[12:00:32.058083] Epoch: [125]  [2000/6672]  eta: 0:58:40  lr: 0.000433  loss: 2.2557 (2.3673)  time: 0.7338  data: 0.0003  max mem: 30335
[12:25:31.548390] Epoch: [125]  [4000/6672]  eta: 0:33:28  lr: 0.000430  loss: 2.4087 (2.3799)  time: 0.7286  data: 0.0003  max mem: 30335
[12:50:35.047930] Epoch: [125]  [6000/6672]  eta: 0:08:25  lr: 0.000427  loss: 2.3526 (2.3871)  time: 0.8957  data: 0.0048  max mem: 30335
[12:58:57.206542] Epoch: [125]  [6671/6672]  eta: 0:00:00  lr: 0.000426  loss: 2.3669 (2.3890)  time: 0.7245  data: 0.0006  max mem: 30335
[12:58:57.877948] Epoch: [125] Total time: 1:23:33 (0.7515 s / it)
[12:58:57.904939] Averaged stats: lr: 0.000426  loss: 2.3669 (2.3864)
[12:59:01.939423] Test:  [   0/2084]  eta: 2:19:30  loss: 0.3325 (0.3325)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.0167  data: 3.4432  max mem: 30335
[13:01:10.561765] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.6359 (0.5920)  acc1: 83.3333 (85.1796)  acc5: 100.0000 (97.2222)  time: 0.2581  data: 0.0002  max mem: 30335
[13:03:19.485857] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5849 (0.6363)  acc1: 83.3333 (83.6705)  acc5: 95.8333 (96.9988)  time: 0.2578  data: 0.0002  max mem: 30335
[13:05:28.082894] Test:  [1500/2084]  eta: 0:02:31  loss: 0.6277 (0.7198)  acc1: 83.3333 (81.5179)  acc5: 95.8333 (96.1415)  time: 0.2570  data: 0.0002  max mem: 30335
[13:07:36.573609] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3120 (0.7580)  acc1: 95.8333 (80.4660)  acc5: 100.0000 (95.6730)  time: 0.2563  data: 0.0002  max mem: 30335
[13:07:57.743904] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4552 (0.7610)  acc1: 87.5000 (80.3840)  acc5: 100.0000 (95.6840)  time: 0.2489  data: 0.0001  max mem: 30335
[13:07:57.856952] Test: Total time: 0:08:59 (0.2591 s / it)
[13:08:13.287834] * Acc@1 80.399 Acc@5 95.686 loss 0.761
[13:08:13.288105] Accuracy of the network on the 50000 test images: 80.4%
[13:08:13.288142] Max accuracy: 80.49%
[13:08:13.377235] log_dir: ./output_dir_qkformer
[13:08:21.070766] Epoch: [126]  [   0/6672]  eta: 14:10:24  lr: 0.000426  loss: 2.1385 (2.1385)  time: 7.6476  data: 2.0764  max mem: 30335
[13:33:21.666954] Epoch: [126]  [2000/6672]  eta: 0:58:41  lr: 0.000423  loss: 2.2492 (2.3639)  time: 0.7358  data: 0.0012  max mem: 30335
[13:58:26.207899] Epoch: [126]  [4000/6672]  eta: 0:33:31  lr: 0.000420  loss: 2.2612 (2.3664)  time: 0.7489  data: 0.0002  max mem: 30335
[14:23:19.849221] Epoch: [126]  [6000/6672]  eta: 0:08:24  lr: 0.000417  loss: 2.2322 (2.3716)  time: 0.7462  data: 0.0003  max mem: 30335
[14:31:48.875091] Epoch: [126]  [6671/6672]  eta: 0:00:00  lr: 0.000416  loss: 2.3395 (2.3709)  time: 0.7251  data: 0.0011  max mem: 30335
[14:31:49.739653] Epoch: [126] Total time: 1:23:36 (0.7519 s / it)
[14:31:49.773276] Averaged stats: lr: 0.000416  loss: 2.3395 (2.3802)
[14:31:55.145603] Test:  [   0/2084]  eta: 3:06:26  loss: 0.3015 (0.3015)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.3678  data: 4.4223  max mem: 30335
[14:34:03.639990] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.7372 (0.5725)  acc1: 79.1667 (85.1131)  acc5: 95.8333 (97.5549)  time: 0.2570  data: 0.0002  max mem: 30335
[14:36:12.128193] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5355 (0.6206)  acc1: 87.5000 (83.6330)  acc5: 95.8333 (97.1528)  time: 0.2573  data: 0.0002  max mem: 30335
[14:38:20.821556] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6425 (0.7079)  acc1: 83.3333 (81.5484)  acc5: 95.8333 (96.1887)  time: 0.2577  data: 0.0002  max mem: 30335
[14:40:30.383590] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3309 (0.7476)  acc1: 91.6667 (80.6555)  acc5: 100.0000 (95.7355)  time: 0.2570  data: 0.0002  max mem: 30335
[14:40:51.599435] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4119 (0.7508)  acc1: 91.6667 (80.5580)  acc5: 100.0000 (95.7440)  time: 0.2490  data: 0.0001  max mem: 30335
[14:40:51.722993] Test: Total time: 0:09:01 (0.2601 s / it)
[14:41:06.836529] * Acc@1 80.575 Acc@5 95.760 loss 0.751
[14:41:06.836867] Accuracy of the network on the 50000 test images: 80.6%
[14:41:06.836916] Max accuracy: 80.58%
[14:41:07.154119] log_dir: ./output_dir_qkformer
[14:41:14.245565] Epoch: [127]  [   0/6672]  eta: 13:05:53  lr: 0.000416  loss: 3.0105 (3.0105)  time: 7.0673  data: 2.2503  max mem: 30335
[15:06:13.555168] Epoch: [127]  [2000/6672]  eta: 0:58:36  lr: 0.000413  loss: 2.2384 (2.3643)  time: 0.8353  data: 0.0003  max mem: 30335
[15:30:54.037515] Epoch: [127]  [4000/6672]  eta: 0:33:14  lr: 0.000410  loss: 2.4229 (2.3703)  time: 0.7257  data: 0.0003  max mem: 30335
[15:55:49.376279] Epoch: [127]  [6000/6672]  eta: 0:08:21  lr: 0.000407  loss: 2.4825 (2.3725)  time: 0.9738  data: 0.0003  max mem: 30335
[16:04:13.245754] Epoch: [127]  [6671/6672]  eta: 0:00:00  lr: 0.000406  loss: 2.4546 (2.3734)  time: 0.7264  data: 0.0011  max mem: 30335
[16:04:14.000953] Epoch: [127] Total time: 1:23:06 (0.7474 s / it)
[16:04:14.035009] Averaged stats: lr: 0.000406  loss: 2.4546 (2.3707)
[16:04:18.296158] Test:  [   0/2084]  eta: 2:27:51  loss: 0.2681 (0.2681)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.2568  data: 3.5340  max mem: 30335
[16:06:26.828117] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.5925 (0.5783)  acc1: 79.1667 (84.8470)  acc5: 100.0000 (97.5882)  time: 0.2575  data: 0.0002  max mem: 30335
[16:08:35.469550] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5814 (0.6321)  acc1: 83.3333 (83.2626)  acc5: 95.8333 (97.1612)  time: 0.2571  data: 0.0002  max mem: 30335
[16:10:44.358324] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4890 (0.7089)  acc1: 83.3333 (81.3846)  acc5: 95.8333 (96.3413)  time: 0.2561  data: 0.0002  max mem: 30335
[16:12:53.047083] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1845 (0.7500)  acc1: 95.8333 (80.4723)  acc5: 100.0000 (95.8333)  time: 0.2574  data: 0.0002  max mem: 30335
[16:13:14.270131] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2987 (0.7518)  acc1: 91.6667 (80.4080)  acc5: 100.0000 (95.8360)  time: 0.2504  data: 0.0002  max mem: 30335
[16:13:14.406493] Test: Total time: 0:09:00 (0.2593 s / it)
[16:13:29.387436] * Acc@1 80.427 Acc@5 95.827 loss 0.751
[16:13:29.388059] Accuracy of the network on the 50000 test images: 80.4%
[16:13:29.388155] Max accuracy: 80.58%
[16:13:29.506158] log_dir: ./output_dir_qkformer
[16:13:32.383212] Epoch: [128]  [   0/6672]  eta: 5:19:48  lr: 0.000406  loss: 2.2752 (2.2752)  time: 2.8760  data: 1.8020  max mem: 30335
[16:38:34.665477] Epoch: [128]  [2000/6672]  eta: 0:58:33  lr: 0.000403  loss: 2.4344 (2.3650)  time: 0.7262  data: 0.0002  max mem: 30335
[17:03:40.680454] Epoch: [128]  [4000/6672]  eta: 0:33:30  lr: 0.000400  loss: 2.3546 (2.3652)  time: 0.7340  data: 0.0003  max mem: 30335
[17:28:45.694936] Epoch: [128]  [6000/6672]  eta: 0:08:25  lr: 0.000397  loss: 2.3454 (2.3665)  time: 0.7291  data: 0.0002  max mem: 30335
[17:37:05.058954] Epoch: [128]  [6671/6672]  eta: 0:00:00  lr: 0.000396  loss: 2.4583 (2.3681)  time: 0.7232  data: 0.0011  max mem: 30335
[17:37:05.716638] Epoch: [128] Total time: 1:23:36 (0.7518 s / it)
[17:37:05.764083] Averaged stats: lr: 0.000396  loss: 2.4583 (2.3648)
[17:37:10.119185] Test:  [   0/2084]  eta: 2:31:02  loss: 0.4136 (0.4136)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.3486  data: 3.6619  max mem: 30335
[17:39:19.031656] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.6972 (0.6036)  acc1: 83.3333 (84.9301)  acc5: 100.0000 (97.3886)  time: 0.2562  data: 0.0002  max mem: 30335
[17:41:27.719964] Test:  [1000/2084]  eta: 0:04:43  loss: 0.6063 (0.6290)  acc1: 87.5000 (83.9327)  acc5: 95.8333 (97.1695)  time: 0.2566  data: 0.0002  max mem: 30335
[17:43:36.194555] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5431 (0.7013)  acc1: 91.6667 (82.0231)  acc5: 95.8333 (96.3635)  time: 0.2559  data: 0.0002  max mem: 30335
[17:45:45.134186] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2132 (0.7371)  acc1: 95.8333 (81.0532)  acc5: 100.0000 (95.9895)  time: 0.2573  data: 0.0002  max mem: 30335
[17:46:06.291516] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3148 (0.7408)  acc1: 91.6667 (80.9660)  acc5: 100.0000 (95.9820)  time: 0.2484  data: 0.0001  max mem: 30335
[17:46:06.411313] Test: Total time: 0:09:00 (0.2594 s / it)
[17:46:21.877208] * Acc@1 80.968 Acc@5 95.986 loss 0.741
[17:46:21.877516] Accuracy of the network on the 50000 test images: 81.0%
[17:46:21.877584] Max accuracy: 80.97%
[17:46:22.065032] log_dir: ./output_dir_qkformer
[17:46:27.508403] Epoch: [129]  [   0/6672]  eta: 10:05:10  lr: 0.000396  loss: 2.1467 (2.1467)  time: 5.4423  data: 2.9163  max mem: 30335
[18:11:09.362367] Epoch: [129]  [2000/6672]  eta: 0:57:51  lr: 0.000393  loss: 2.2768 (2.3615)  time: 0.7923  data: 0.0003  max mem: 30335
[18:36:18.174784] Epoch: [129]  [4000/6672]  eta: 0:33:20  lr: 0.000390  loss: 2.4735 (2.3613)  time: 0.8680  data: 0.0002  max mem: 30335
[19:01:09.018975] Epoch: [129]  [6000/6672]  eta: 0:08:22  lr: 0.000387  loss: 2.2691 (2.3621)  time: 0.7429  data: 0.0003  max mem: 30335
[19:09:37.967153] Epoch: [129]  [6671/6672]  eta: 0:00:00  lr: 0.000386  loss: 2.3782 (2.3642)  time: 0.7258  data: 0.0011  max mem: 30335
[19:09:38.794246] Epoch: [129] Total time: 1:23:16 (0.7489 s / it)
[19:09:38.839015] Averaged stats: lr: 0.000386  loss: 2.3782 (2.3559)
[19:09:43.542395] Test:  [   0/2084]  eta: 2:43:06  loss: 0.2962 (0.2962)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6959  data: 3.7691  max mem: 30335
[19:11:51.990585] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.6279 (0.5859)  acc1: 83.3333 (85.2212)  acc5: 100.0000 (97.4218)  time: 0.2567  data: 0.0002  max mem: 30335
[19:14:00.754737] Test:  [1000/2084]  eta: 0:04:43  loss: 0.6341 (0.6280)  acc1: 83.3333 (83.7912)  acc5: 95.8333 (97.0613)  time: 0.2562  data: 0.0002  max mem: 30335
[19:16:09.256604] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5882 (0.7121)  acc1: 83.3333 (81.7122)  acc5: 95.8333 (96.0887)  time: 0.2560  data: 0.0002  max mem: 30335
[19:18:17.664624] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3172 (0.7375)  acc1: 91.6667 (80.9429)  acc5: 100.0000 (95.8354)  time: 0.2562  data: 0.0002  max mem: 30335
[19:18:39.206894] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3085 (0.7436)  acc1: 91.6667 (80.7620)  acc5: 100.0000 (95.8100)  time: 0.2490  data: 0.0001  max mem: 30335
[19:18:39.308411] Test: Total time: 0:09:00 (0.2593 s / it)
[19:18:54.744130] * Acc@1 80.775 Acc@5 95.814 loss 0.743
[19:18:54.744380] Accuracy of the network on the 50000 test images: 80.8%
[19:18:54.744433] Max accuracy: 80.97%
[19:18:54.813344] log_dir: ./output_dir_qkformer
[19:18:58.719623] Epoch: [130]  [   0/6672]  eta: 7:12:53  lr: 0.000386  loss: 2.4342 (2.4342)  time: 3.8929  data: 2.7921  max mem: 30335
[19:43:56.496748] Epoch: [130]  [2000/6672]  eta: 0:58:25  lr: 0.000383  loss: 2.3836 (2.3349)  time: 0.7278  data: 0.0003  max mem: 30335
[20:09:00.250199] Epoch: [130]  [4000/6672]  eta: 0:33:26  lr: 0.000380  loss: 2.1961 (2.3428)  time: 0.7269  data: 0.0003  max mem: 30335
[20:33:50.934992] Epoch: [130]  [6000/6672]  eta: 0:08:23  lr: 0.000378  loss: 2.4263 (2.3440)  time: 0.7403  data: 0.0006  max mem: 30335
[20:42:16.845359] Epoch: [130]  [6671/6672]  eta: 0:00:00  lr: 0.000377  loss: 2.3341 (2.3465)  time: 0.7241  data: 0.0011  max mem: 30335
[20:42:17.675833] Epoch: [130] Total time: 1:23:22 (0.7498 s / it)
[20:42:17.741329] Averaged stats: lr: 0.000377  loss: 2.3341 (2.3457)
[20:42:22.971893] Test:  [   0/2084]  eta: 3:01:28  loss: 0.4739 (0.4739)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.2250  data: 4.4402  max mem: 30335
[20:44:32.892896] Test:  [ 500/2084]  eta: 0:07:07  loss: 0.7009 (0.5708)  acc1: 75.0000 (84.9301)  acc5: 100.0000 (97.4385)  time: 0.2566  data: 0.0002  max mem: 30335
[20:46:41.282901] Test:  [1000/2084]  eta: 0:04:45  loss: 0.6216 (0.6101)  acc1: 87.5000 (84.0243)  acc5: 91.6667 (97.1112)  time: 0.2564  data: 0.0002  max mem: 30335
[20:48:50.260029] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5216 (0.6897)  acc1: 87.5000 (82.0314)  acc5: 95.8333 (96.2636)  time: 0.2561  data: 0.0002  max mem: 30335
[20:51:00.456783] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2450 (0.7249)  acc1: 95.8333 (81.0532)  acc5: 100.0000 (95.9021)  time: 0.2566  data: 0.0002  max mem: 30335
[20:51:21.584232] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4862 (0.7280)  acc1: 87.5000 (80.9680)  acc5: 100.0000 (95.9080)  time: 0.2493  data: 0.0001  max mem: 30335
[20:51:21.705386] Test: Total time: 0:09:03 (0.2610 s / it)
[20:51:36.685781] * Acc@1 80.955 Acc@5 95.911 loss 0.728
[20:51:36.686070] Accuracy of the network on the 50000 test images: 81.0%
[20:51:36.686111] Max accuracy: 80.97%
[20:51:36.907759] log_dir: ./output_dir_qkformer
[20:51:43.906263] Epoch: [131]  [   0/6672]  eta: 12:54:41  lr: 0.000377  loss: 3.0181 (3.0181)  time: 6.9667  data: 2.6461  max mem: 30335
[21:16:34.083074] Epoch: [131]  [2000/6672]  eta: 0:58:14  lr: 0.000374  loss: 2.2643 (2.3226)  time: 0.7280  data: 0.0002  max mem: 30335
[21:41:54.788597] Epoch: [131]  [4000/6672]  eta: 0:33:34  lr: 0.000371  loss: 2.2015 (2.3288)  time: 0.7601  data: 0.0003  max mem: 30335
[22:07:04.754123] Epoch: [131]  [6000/6672]  eta: 0:08:26  lr: 0.000368  loss: 2.3377 (2.3341)  time: 0.8348  data: 0.0002  max mem: 30335
[22:15:27.810649] Epoch: [131]  [6671/6672]  eta: 0:00:00  lr: 0.000367  loss: 2.1901 (2.3357)  time: 0.7297  data: 0.0011  max mem: 30335
[22:15:28.576043] Epoch: [131] Total time: 1:23:51 (0.7541 s / it)
[22:15:28.600998] Averaged stats: lr: 0.000367  loss: 2.1901 (2.3380)
[22:15:34.051460] Test:  [   0/2084]  eta: 3:09:10  loss: 0.2741 (0.2741)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 5.4464  data: 4.3006  max mem: 30335
[22:17:42.636775] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.8039 (0.5936)  acc1: 79.1667 (84.4228)  acc5: 95.8333 (97.6214)  time: 0.2568  data: 0.0002  max mem: 30335
[22:19:51.261928] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6134 (0.6361)  acc1: 87.5000 (83.1377)  acc5: 95.8333 (97.2278)  time: 0.2565  data: 0.0002  max mem: 30335
[22:22:02.835457] Test:  [1500/2084]  eta: 0:02:33  loss: 0.6206 (0.7081)  acc1: 83.3333 (81.4596)  acc5: 95.8333 (96.4579)  time: 0.2565  data: 0.0002  max mem: 30335
[22:24:11.425720] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2498 (0.7418)  acc1: 91.6667 (80.7846)  acc5: 100.0000 (96.0395)  time: 0.2573  data: 0.0002  max mem: 30335
[22:24:32.631496] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3890 (0.7448)  acc1: 91.6667 (80.7380)  acc5: 100.0000 (96.0300)  time: 0.2496  data: 0.0001  max mem: 30335
[22:24:32.776959] Test: Total time: 0:09:04 (0.2611 s / it)
[22:24:48.131499] * Acc@1 80.733 Acc@5 96.025 loss 0.745
[22:24:48.131915] Accuracy of the network on the 50000 test images: 80.7%
[22:24:48.131981] Max accuracy: 80.97%
[22:24:48.424277] log_dir: ./output_dir_qkformer
[22:24:56.532609] Epoch: [132]  [   0/6672]  eta: 15:01:31  lr: 0.000367  loss: 2.9333 (2.9333)  time: 8.1072  data: 1.9643  max mem: 30335
[22:50:18.042304] Epoch: [132]  [2000/6672]  eta: 0:59:30  lr: 0.000364  loss: 2.1958 (2.3078)  time: 0.7284  data: 0.0002  max mem: 30335
[23:15:18.286643] Epoch: [132]  [4000/6672]  eta: 0:33:43  lr: 0.000361  loss: 2.2594 (2.3254)  time: 0.7262  data: 0.0003  max mem: 30335
[23:40:18.216332] Epoch: [132]  [6000/6672]  eta: 0:08:27  lr: 0.000358  loss: 2.1878 (2.3300)  time: 0.7531  data: 0.0002  max mem: 30335
[23:48:44.859346] Epoch: [132]  [6671/6672]  eta: 0:00:00  lr: 0.000357  loss: 2.1647 (2.3316)  time: 0.7956  data: 0.0011  max mem: 30335
[23:48:45.493270] Epoch: [132] Total time: 1:23:57 (0.7550 s / it)
[23:48:45.544126] Averaged stats: lr: 0.000357  loss: 2.1647 (2.3293)
[23:48:50.354424] Test:  [   0/2084]  eta: 2:46:55  loss: 0.3095 (0.3095)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.8061  data: 3.8306  max mem: 30335
[23:50:59.243963] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.6346 (0.5792)  acc1: 79.1667 (85.1796)  acc5: 100.0000 (97.4967)  time: 0.2563  data: 0.0002  max mem: 30335
[23:53:08.045701] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4669 (0.6240)  acc1: 83.3333 (83.5789)  acc5: 95.8333 (97.1986)  time: 0.2568  data: 0.0002  max mem: 30335
[23:55:18.934566] Test:  [1500/2084]  eta: 0:02:33  loss: 0.6604 (0.6960)  acc1: 83.3333 (81.8649)  acc5: 95.8333 (96.3885)  time: 0.2565  data: 0.0002  max mem: 30335
[23:57:27.833171] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3228 (0.7336)  acc1: 91.6667 (81.0657)  acc5: 100.0000 (95.9374)  time: 0.2564  data: 0.0002  max mem: 30335
[23:57:49.029050] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4834 (0.7363)  acc1: 87.5000 (80.9620)  acc5: 100.0000 (95.9240)  time: 0.2500  data: 0.0001  max mem: 30335
[23:57:49.158528] Test: Total time: 0:09:03 (0.2608 s / it)
[23:58:04.851802] * Acc@1 80.968 Acc@5 95.936 loss 0.736
[23:58:04.852041] Accuracy of the network on the 50000 test images: 81.0%
[23:58:04.852072] Max accuracy: 80.97%
[23:58:05.232166] log_dir: ./output_dir_qkformer
[23:58:15.747026] Epoch: [133]  [   0/6672]  eta: 19:29:04  lr: 0.000357  loss: 1.7999 (1.7999)  time: 10.5132  data: 2.0747  max mem: 30335
[00:23:30.114262] Epoch: [133]  [2000/6672]  eta: 0:59:19  lr: 0.000354  loss: 2.4028 (2.3185)  time: 0.7310  data: 0.0003  max mem: 30335
[00:48:46.372760] Epoch: [133]  [4000/6672]  eta: 0:33:50  lr: 0.000352  loss: 2.2895 (2.3162)  time: 0.7445  data: 0.0003  max mem: 30335
[01:14:09.853131] Epoch: [133]  [6000/6672]  eta: 0:08:31  lr: 0.000349  loss: 2.3416 (2.3191)  time: 0.7293  data: 0.0003  max mem: 30335
[01:22:42.577918] Epoch: [133]  [6671/6672]  eta: 0:00:00  lr: 0.000348  loss: 2.1642 (2.3171)  time: 0.7259  data: 0.0006  max mem: 30335
[01:22:43.491058] Epoch: [133] Total time: 1:24:38 (0.7611 s / it)
[01:22:43.532108] Averaged stats: lr: 0.000348  loss: 2.1642 (2.3186)
[01:22:48.705974] Test:  [   0/2084]  eta: 2:59:33  loss: 0.3824 (0.3824)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.1697  data: 4.4274  max mem: 30335
[01:24:58.621911] Test:  [ 500/2084]  eta: 0:07:07  loss: 0.7180 (0.5793)  acc1: 87.5000 (85.2878)  acc5: 100.0000 (97.5050)  time: 0.2570  data: 0.0002  max mem: 30335
[01:27:07.413492] Test:  [1000/2084]  eta: 0:04:45  loss: 0.6229 (0.6209)  acc1: 87.5000 (83.9286)  acc5: 95.8333 (97.2070)  time: 0.2578  data: 0.0002  max mem: 30335
[01:29:15.795344] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6435 (0.6990)  acc1: 87.5000 (82.0536)  acc5: 95.8333 (96.3830)  time: 0.2568  data: 0.0002  max mem: 30335
[01:31:24.576684] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3454 (0.7434)  acc1: 91.6667 (80.9991)  acc5: 100.0000 (95.8625)  time: 0.2571  data: 0.0002  max mem: 30335
[01:31:45.780890] Test:  [2083/2084]  eta: 0:00:00  loss: 0.5409 (0.7463)  acc1: 91.6667 (80.9220)  acc5: 100.0000 (95.8840)  time: 0.2499  data: 0.0002  max mem: 30335
[01:31:45.916467] Test: Total time: 0:09:02 (0.2603 s / it)
[01:32:00.939127] * Acc@1 80.945 Acc@5 95.892 loss 0.746
[01:32:00.939444] Accuracy of the network on the 50000 test images: 80.9%
[01:32:00.939476] Max accuracy: 80.97%
[01:32:01.288346] log_dir: ./output_dir_qkformer
[01:32:04.351137] Epoch: [134]  [   0/6672]  eta: 5:40:25  lr: 0.000348  loss: 2.9896 (2.9896)  time: 3.0613  data: 2.0378  max mem: 30335
[01:57:23.535304] Epoch: [134]  [2000/6672]  eta: 0:59:13  lr: 0.000345  loss: 2.2742 (2.2951)  time: 0.7522  data: 0.0003  max mem: 30335
[02:22:32.820239] Epoch: [134]  [4000/6672]  eta: 0:33:44  lr: 0.000342  loss: 2.1487 (2.3055)  time: 0.7271  data: 0.0003  max mem: 30335
[02:47:46.795923] Epoch: [134]  [6000/6672]  eta: 0:08:28  lr: 0.000339  loss: 2.0417 (2.3089)  time: 0.7920  data: 0.0002  max mem: 30335
[02:56:06.672319] Epoch: [134]  [6671/6672]  eta: 0:00:00  lr: 0.000338  loss: 2.2737 (2.3088)  time: 0.7240  data: 0.0011  max mem: 30335
[02:56:07.429599] Epoch: [134] Total time: 1:24:06 (0.7563 s / it)
[02:56:07.462217] Averaged stats: lr: 0.000338  loss: 2.2737 (2.3134)
[02:56:13.066937] Test:  [   0/2084]  eta: 3:14:27  loss: 0.2831 (0.2831)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.5987  data: 4.6811  max mem: 30335
[02:58:21.564844] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.6177 (0.5534)  acc1: 83.3333 (85.6454)  acc5: 100.0000 (97.5715)  time: 0.2576  data: 0.0002  max mem: 30335
[03:00:31.326237] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5366 (0.6008)  acc1: 87.5000 (84.2907)  acc5: 95.8333 (97.2486)  time: 0.2579  data: 0.0002  max mem: 30335
[03:02:40.449558] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5696 (0.6747)  acc1: 87.5000 (82.5255)  acc5: 95.8333 (96.4440)  time: 0.2554  data: 0.0002  max mem: 30335
[03:04:51.605933] Test:  [2000/2084]  eta: 0:00:22  loss: 0.2949 (0.7163)  acc1: 91.6667 (81.5676)  acc5: 100.0000 (95.9812)  time: 0.2569  data: 0.0002  max mem: 30335
[03:05:12.749200] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3083 (0.7201)  acc1: 91.6667 (81.4480)  acc5: 100.0000 (95.9660)  time: 0.2490  data: 0.0001  max mem: 30335
[03:05:12.880235] Test: Total time: 0:09:05 (0.2617 s / it)
[03:05:28.499980] * Acc@1 81.441 Acc@5 95.965 loss 0.720
[03:05:28.500241] Accuracy of the network on the 50000 test images: 81.4%
[03:05:28.500277] Max accuracy: 81.44%
[03:05:28.647014] log_dir: ./output_dir_qkformer
[03:05:36.357759] Epoch: [135]  [   0/6672]  eta: 14:05:27  lr: 0.000338  loss: 1.5778 (1.5778)  time: 7.6031  data: 2.3854  max mem: 30335
[03:30:42.038498] Epoch: [135]  [2000/6672]  eta: 0:58:52  lr: 0.000335  loss: 2.2434 (2.2890)  time: 0.7279  data: 0.0003  max mem: 30335
[03:55:43.255795] Epoch: [135]  [4000/6672]  eta: 0:33:32  lr: 0.000333  loss: 2.3009 (2.3020)  time: 0.9851  data: 0.0003  max mem: 30335
[04:20:54.404881] Epoch: [135]  [6000/6672]  eta: 0:08:26  lr: 0.000330  loss: 2.1763 (2.3036)  time: 0.7286  data: 0.0002  max mem: 30335
[04:29:27.370371] Epoch: [135]  [6671/6672]  eta: 0:00:00  lr: 0.000329  loss: 2.3147 (2.3056)  time: 0.7293  data: 0.0011  max mem: 30335
[04:29:28.162336] Epoch: [135] Total time: 1:23:59 (0.7553 s / it)
[04:29:28.204937] Averaged stats: lr: 0.000329  loss: 2.3147 (2.3041)
[04:29:33.860869] Test:  [   0/2084]  eta: 3:16:12  loss: 0.2885 (0.2885)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.6488  data: 5.0727  max mem: 30335
[04:31:42.327338] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.6116 (0.5467)  acc1: 79.1667 (85.6786)  acc5: 100.0000 (97.7711)  time: 0.2569  data: 0.0002  max mem: 30335
[04:33:50.782492] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5321 (0.5933)  acc1: 87.5000 (84.2741)  acc5: 95.8333 (97.3776)  time: 0.2575  data: 0.0002  max mem: 30335
[04:35:59.611671] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5653 (0.6612)  acc1: 83.3333 (82.6588)  acc5: 95.8333 (96.6272)  time: 0.2574  data: 0.0002  max mem: 30335
[04:38:07.991676] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2114 (0.7044)  acc1: 95.8333 (81.5717)  acc5: 100.0000 (96.1144)  time: 0.2566  data: 0.0002  max mem: 30335
[04:38:29.158459] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2653 (0.7066)  acc1: 91.6667 (81.4900)  acc5: 100.0000 (96.1160)  time: 0.2503  data: 0.0001  max mem: 30335
[04:38:29.293821] Test: Total time: 0:09:01 (0.2596 s / it)
[04:38:44.852639] * Acc@1 81.464 Acc@5 96.114 loss 0.707
[04:38:44.853040] Accuracy of the network on the 50000 test images: 81.5%
[04:38:44.853072] Max accuracy: 81.46%
[04:38:45.304829] log_dir: ./output_dir_qkformer
[04:38:48.604476] Epoch: [136]  [   0/6672]  eta: 6:06:42  lr: 0.000329  loss: 2.4252 (2.4252)  time: 3.2978  data: 2.3083  max mem: 30335
[05:03:53.022476] Epoch: [136]  [2000/6672]  eta: 0:58:39  lr: 0.000326  loss: 2.0776 (2.2716)  time: 0.7944  data: 0.0003  max mem: 30335
[05:28:57.770256] Epoch: [136]  [4000/6672]  eta: 0:33:31  lr: 0.000323  loss: 2.3176 (2.2877)  time: 0.7461  data: 0.0003  max mem: 30335
[05:53:58.121424] Epoch: [136]  [6000/6672]  eta: 0:08:25  lr: 0.000321  loss: 2.1679 (2.2911)  time: 0.7341  data: 0.0003  max mem: 30335
[06:02:15.340392] Epoch: [136]  [6671/6672]  eta: 0:00:00  lr: 0.000320  loss: 2.3485 (2.2933)  time: 0.7232  data: 0.0011  max mem: 30335
[06:02:16.087974] Epoch: [136] Total time: 1:23:30 (0.7510 s / it)
[06:02:16.144499] Averaged stats: lr: 0.000320  loss: 2.3485 (2.2944)
[06:02:21.077504] Test:  [   0/2084]  eta: 2:51:11  loss: 0.2055 (0.2055)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.9287  data: 4.2623  max mem: 30335
[06:04:29.837246] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.5753 (0.5615)  acc1: 83.3333 (85.1464)  acc5: 100.0000 (97.7046)  time: 0.2571  data: 0.0002  max mem: 30335
[06:06:38.327520] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5592 (0.5977)  acc1: 87.5000 (83.9535)  acc5: 95.8333 (97.4151)  time: 0.2566  data: 0.0002  max mem: 30335
[06:08:46.927637] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4425 (0.6787)  acc1: 87.5000 (82.0786)  acc5: 95.8333 (96.4940)  time: 0.2571  data: 0.0002  max mem: 30335
[06:10:55.338348] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2820 (0.7130)  acc1: 91.6667 (81.3198)  acc5: 100.0000 (96.1186)  time: 0.2562  data: 0.0002  max mem: 30335
[06:11:17.048315] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2822 (0.7173)  acc1: 91.6667 (81.2220)  acc5: 100.0000 (96.1060)  time: 0.2492  data: 0.0001  max mem: 30335
[06:11:17.203949] Test: Total time: 0:09:01 (0.2596 s / it)
[06:11:32.574367] * Acc@1 81.207 Acc@5 96.089 loss 0.717
[06:11:32.574699] Accuracy of the network on the 50000 test images: 81.2%
[06:11:32.574730] Max accuracy: 81.46%
[06:11:33.191267] log_dir: ./output_dir_qkformer
[06:11:40.321988] Epoch: [137]  [   0/6672]  eta: 13:12:06  lr: 0.000320  loss: 2.6451 (2.6451)  time: 7.1233  data: 2.5901  max mem: 30335
[06:36:44.577031] Epoch: [137]  [2000/6672]  eta: 0:58:48  lr: 0.000317  loss: 2.1696 (2.2717)  time: 0.7275  data: 0.0003  max mem: 30335
[07:01:50.786370] Epoch: [137]  [4000/6672]  eta: 0:33:34  lr: 0.000314  loss: 2.1956 (2.2782)  time: 0.7247  data: 0.0002  max mem: 30335
[07:27:11.787406] Epoch: [137]  [6000/6672]  eta: 0:08:28  lr: 0.000311  loss: 2.4175 (2.2866)  time: 0.7277  data: 0.0002  max mem: 30335
[07:35:40.612162] Epoch: [137]  [6671/6672]  eta: 0:00:00  lr: 0.000310  loss: 2.3314 (2.2876)  time: 0.7263  data: 0.0011  max mem: 30335
[07:35:41.389782] Epoch: [137] Total time: 1:24:08 (0.7566 s / it)
[07:35:41.437644] Averaged stats: lr: 0.000310  loss: 2.3314 (2.2859)
[07:35:46.661828] Test:  [   0/2084]  eta: 3:01:15  loss: 0.2679 (0.2679)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 5.2185  data: 4.3483  max mem: 30335
[07:37:55.196985] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.6106 (0.5510)  acc1: 79.1667 (85.2545)  acc5: 100.0000 (97.6297)  time: 0.2570  data: 0.0002  max mem: 30335
[07:40:04.334776] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4591 (0.5918)  acc1: 91.6667 (84.1200)  acc5: 95.8333 (97.3859)  time: 0.2569  data: 0.0002  max mem: 30335
[07:42:12.725621] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4478 (0.6696)  acc1: 87.5000 (82.4034)  acc5: 95.8333 (96.5995)  time: 0.2571  data: 0.0002  max mem: 30335
[07:44:21.944515] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2804 (0.7096)  acc1: 91.6667 (81.5072)  acc5: 100.0000 (96.1353)  time: 0.2881  data: 0.0159  max mem: 30335
[07:44:43.110596] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4070 (0.7142)  acc1: 91.6667 (81.4220)  acc5: 100.0000 (96.1100)  time: 0.2492  data: 0.0001  max mem: 30335
[07:44:43.229884] Test: Total time: 0:09:01 (0.2600 s / it)
[07:44:58.633958] * Acc@1 81.428 Acc@5 96.102 loss 0.714
[07:44:58.634255] Accuracy of the network on the 50000 test images: 81.4%
[07:44:58.634296] Max accuracy: 81.46%
[07:44:59.454049] log_dir: ./output_dir_qkformer
[07:45:03.858162] Epoch: [138]  [   0/6672]  eta: 8:09:32  lr: 0.000310  loss: 2.9341 (2.9341)  time: 4.4024  data: 2.0566  max mem: 30335
[08:10:01.682945] Epoch: [138]  [2000/6672]  eta: 0:58:27  lr: 0.000308  loss: 2.2022 (2.2571)  time: 0.7294  data: 0.0003  max mem: 30335
[08:34:56.326539] Epoch: [138]  [4000/6672]  eta: 0:33:21  lr: 0.000305  loss: 2.2509 (2.2592)  time: 0.7775  data: 0.0013  max mem: 30335
[09:00:04.440474] Epoch: [138]  [6000/6672]  eta: 0:08:24  lr: 0.000302  loss: 2.2344 (2.2650)  time: 0.7845  data: 0.0002  max mem: 30335
[09:08:31.763356] Epoch: [138]  [6671/6672]  eta: 0:00:00  lr: 0.000301  loss: 2.2789 (2.2657)  time: 0.7246  data: 0.0011  max mem: 30335
[09:08:32.517389] Epoch: [138] Total time: 1:23:33 (0.7514 s / it)
[09:08:32.628843] Averaged stats: lr: 0.000301  loss: 2.2789 (2.2761)
[09:08:38.467194] Test:  [   0/2084]  eta: 3:22:35  loss: 0.4599 (0.4599)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.8328  data: 4.9122  max mem: 30335
[09:10:47.055712] Test:  [ 500/2084]  eta: 0:07:04  loss: 0.5063 (0.5469)  acc1: 83.3333 (85.7369)  acc5: 100.0000 (97.7545)  time: 0.2569  data: 0.0002  max mem: 30335
[09:12:55.407095] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5735 (0.5894)  acc1: 87.5000 (84.6320)  acc5: 95.8333 (97.3943)  time: 0.2566  data: 0.0002  max mem: 30335
[09:15:04.346984] Test:  [1500/2084]  eta: 0:02:32  loss: 0.6172 (0.6635)  acc1: 83.3333 (83.0141)  acc5: 95.8333 (96.6661)  time: 0.2568  data: 0.0002  max mem: 30335
[09:17:14.583453] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2913 (0.7012)  acc1: 91.6667 (81.9674)  acc5: 100.0000 (96.3164)  time: 0.2581  data: 0.0002  max mem: 30335
[09:17:35.759001] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2700 (0.7049)  acc1: 91.6667 (81.8500)  acc5: 100.0000 (96.3060)  time: 0.2484  data: 0.0001  max mem: 30335
[09:17:35.871992] Test: Total time: 0:09:03 (0.2607 s / it)
[09:17:51.538577] * Acc@1 81.826 Acc@5 96.309 loss 0.705
[09:17:51.538753] Accuracy of the network on the 50000 test images: 81.8%
[09:17:51.538782] Max accuracy: 81.83%
[09:17:51.955593] log_dir: ./output_dir_qkformer
[09:18:02.201595] Epoch: [139]  [   0/6672]  eta: 18:57:46  lr: 0.000301  loss: 2.3173 (2.3173)  time: 10.2317  data: 2.1468  max mem: 30335
[09:43:15.308170] Epoch: [139]  [2000/6672]  eta: 0:59:15  lr: 0.000299  loss: 2.3306 (2.2528)  time: 0.7590  data: 0.0002  max mem: 30335
[10:08:22.830703] Epoch: [139]  [4000/6672]  eta: 0:33:43  lr: 0.000296  loss: 2.1853 (2.2549)  time: 0.7274  data: 0.0003  max mem: 30335
[10:33:24.645416] Epoch: [139]  [6000/6672]  eta: 0:08:27  lr: 0.000293  loss: 2.2301 (2.2596)  time: 0.7297  data: 0.0003  max mem: 30335
[10:41:47.600607] Epoch: [139]  [6671/6672]  eta: 0:00:00  lr: 0.000292  loss: 2.0902 (2.2594)  time: 0.7299  data: 0.0006  max mem: 30335
[10:41:48.409011] Epoch: [139] Total time: 1:23:56 (0.7549 s / it)
[10:41:48.462305] Averaged stats: lr: 0.000292  loss: 2.0902 (2.2672)
[10:41:53.399158] Test:  [   0/2084]  eta: 2:51:18  loss: 0.4219 (0.4219)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.9322  data: 4.0244  max mem: 30335
[10:44:02.231853] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.4568 (0.5599)  acc1: 87.5000 (85.5539)  acc5: 100.0000 (97.6630)  time: 0.2575  data: 0.0002  max mem: 30335
[10:46:11.148840] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5077 (0.5979)  acc1: 87.5000 (84.5321)  acc5: 95.8333 (97.5067)  time: 0.2567  data: 0.0002  max mem: 30335
[10:48:19.661681] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5261 (0.6662)  acc1: 87.5000 (82.8864)  acc5: 95.8333 (96.6994)  time: 0.2570  data: 0.0002  max mem: 30335
[10:50:28.399008] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2986 (0.7026)  acc1: 91.6667 (81.8674)  acc5: 100.0000 (96.3227)  time: 0.2572  data: 0.0002  max mem: 30335
[10:50:49.742107] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4269 (0.7055)  acc1: 91.6667 (81.7840)  acc5: 100.0000 (96.3100)  time: 0.2493  data: 0.0001  max mem: 30335
[10:50:49.854307] Test: Total time: 0:09:01 (0.2598 s / it)
[10:51:04.898779] * Acc@1 81.789 Acc@5 96.299 loss 0.706
[10:51:04.899021] Accuracy of the network on the 50000 test images: 81.8%
[10:51:04.899052] Max accuracy: 81.83%
[10:51:05.340544] log_dir: ./output_dir_qkformer
[10:51:12.651429] Epoch: [140]  [   0/6672]  eta: 13:31:26  lr: 0.000292  loss: 2.4274 (2.4274)  time: 7.2972  data: 2.8813  max mem: 30335
[11:16:22.581552] Epoch: [140]  [2000/6672]  eta: 0:59:01  lr: 0.000290  loss: 2.3575 (2.2544)  time: 0.7283  data: 0.0002  max mem: 30335
[11:41:51.273145] Epoch: [140]  [4000/6672]  eta: 0:33:53  lr: 0.000287  loss: 2.2070 (2.2579)  time: 0.7279  data: 0.0003  max mem: 30335
[12:06:58.629840] Epoch: [140]  [6000/6672]  eta: 0:08:29  lr: 0.000284  loss: 2.2382 (2.2572)  time: 0.7300  data: 0.0003  max mem: 30335
[12:15:20.661912] Epoch: [140]  [6671/6672]  eta: 0:00:00  lr: 0.000283  loss: 2.2466 (2.2580)  time: 0.7249  data: 0.0006  max mem: 30335
[12:15:21.475345] Epoch: [140] Total time: 1:24:16 (0.7578 s / it)
[12:15:21.516761] Averaged stats: lr: 0.000283  loss: 2.2466 (2.2590)
[12:15:26.040981] Test:  [   0/2084]  eta: 2:36:47  loss: 0.3800 (0.3800)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.5139  data: 3.9258  max mem: 30335
[12:17:34.541724] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.6054 (0.5558)  acc1: 83.3333 (85.8367)  acc5: 100.0000 (97.5965)  time: 0.2574  data: 0.0002  max mem: 30335
[12:19:42.996167] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5891 (0.6044)  acc1: 83.3333 (84.2990)  acc5: 95.8333 (97.4484)  time: 0.2567  data: 0.0002  max mem: 30335
[12:21:51.546845] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4573 (0.6780)  acc1: 87.5000 (82.5339)  acc5: 95.8333 (96.6078)  time: 0.2579  data: 0.0002  max mem: 30335
[12:24:00.114651] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2404 (0.7069)  acc1: 95.8333 (81.8299)  acc5: 100.0000 (96.2581)  time: 0.2576  data: 0.0002  max mem: 30335
[12:24:21.709221] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3170 (0.7086)  acc1: 91.6667 (81.7620)  acc5: 100.0000 (96.2720)  time: 0.2499  data: 0.0001  max mem: 30335
[12:24:21.844210] Test: Total time: 0:09:00 (0.2593 s / it)
[12:24:36.832845] * Acc@1 81.756 Acc@5 96.254 loss 0.708
[12:24:36.833138] Accuracy of the network on the 50000 test images: 81.8%
[12:24:36.833168] Max accuracy: 81.83%
[12:24:36.982616] log_dir: ./output_dir_qkformer
[12:24:47.490262] Epoch: [141]  [   0/6672]  eta: 19:23:44  lr: 0.000283  loss: 1.9434 (1.9434)  time: 10.4653  data: 3.0400  max mem: 30335
[12:49:58.537266] Epoch: [141]  [2000/6672]  eta: 0:59:11  lr: 0.000281  loss: 2.3157 (2.2280)  time: 0.7301  data: 0.0003  max mem: 30335
[13:15:01.049756] Epoch: [141]  [4000/6672]  eta: 0:33:39  lr: 0.000278  loss: 2.3242 (2.2409)  time: 0.7288  data: 0.0003  max mem: 30335
[13:40:01.450583] Epoch: [141]  [6000/6672]  eta: 0:08:26  lr: 0.000276  loss: 2.1071 (2.2468)  time: 0.7539  data: 0.0004  max mem: 30335
[13:48:16.952513] Epoch: [141]  [6671/6672]  eta: 0:00:00  lr: 0.000275  loss: 2.1670 (2.2471)  time: 0.7231  data: 0.0011  max mem: 30335
[13:48:17.828310] Epoch: [141] Total time: 1:23:40 (0.7525 s / it)
[13:48:17.865026] Averaged stats: lr: 0.000275  loss: 2.1670 (2.2482)
[13:48:22.284031] Test:  [   0/2084]  eta: 2:33:19  loss: 0.2839 (0.2839)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4145  data: 3.7382  max mem: 30335
[13:50:31.463099] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.6414 (0.5466)  acc1: 83.3333 (85.8367)  acc5: 100.0000 (97.7129)  time: 0.2571  data: 0.0002  max mem: 30335
[13:52:40.350806] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4367 (0.5918)  acc1: 87.5000 (84.5446)  acc5: 95.8333 (97.3360)  time: 0.2571  data: 0.0002  max mem: 30335
[13:54:49.085773] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4239 (0.6707)  acc1: 91.6667 (82.5949)  acc5: 95.8333 (96.5884)  time: 0.2565  data: 0.0002  max mem: 30335
[13:56:58.367323] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2048 (0.7026)  acc1: 95.8333 (81.7716)  acc5: 100.0000 (96.2081)  time: 0.2754  data: 0.0002  max mem: 30335
[13:57:19.555947] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3688 (0.7033)  acc1: 91.6667 (81.7620)  acc5: 100.0000 (96.2540)  time: 0.2488  data: 0.0001  max mem: 30335
[13:57:19.673772] Test: Total time: 0:09:01 (0.2600 s / it)
[13:57:35.162288] * Acc@1 81.760 Acc@5 96.272 loss 0.703
[13:57:35.162542] Accuracy of the network on the 50000 test images: 81.8%
[13:57:35.162593] Max accuracy: 81.83%
[13:57:35.565115] log_dir: ./output_dir_qkformer
[13:57:45.228588] Epoch: [142]  [   0/6672]  eta: 17:48:52  lr: 0.000275  loss: 1.9195 (1.9195)  time: 9.6121  data: 3.2137  max mem: 30335
[14:22:47.809458] Epoch: [142]  [2000/6672]  eta: 0:58:49  lr: 0.000272  loss: 2.3405 (2.2353)  time: 0.7339  data: 0.0003  max mem: 30335
[14:47:56.408196] Epoch: [142]  [4000/6672]  eta: 0:33:36  lr: 0.000269  loss: 2.1648 (2.2416)  time: 0.7275  data: 0.0003  max mem: 30335
[15:13:08.197073] Epoch: [142]  [6000/6672]  eta: 0:08:27  lr: 0.000267  loss: 2.0512 (2.2422)  time: 0.7309  data: 0.0002  max mem: 30335
[15:21:34.055394] Epoch: [142]  [6671/6672]  eta: 0:00:00  lr: 0.000266  loss: 2.2558 (2.2427)  time: 0.7242  data: 0.0011  max mem: 30335
[15:21:34.743776] Epoch: [142] Total time: 1:23:59 (0.7553 s / it)
[15:21:34.814260] Averaged stats: lr: 0.000266  loss: 2.2558 (2.2402)
[15:21:38.758165] Test:  [   0/2084]  eta: 2:16:44  loss: 0.3922 (0.3922)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.9369  data: 3.1719  max mem: 30335
[15:23:47.211666] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.6400 (0.5223)  acc1: 79.1667 (86.1361)  acc5: 100.0000 (97.8959)  time: 0.2590  data: 0.0028  max mem: 30335
[15:25:57.099520] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5955 (0.5703)  acc1: 83.3333 (84.8610)  acc5: 95.8333 (97.5150)  time: 0.2567  data: 0.0002  max mem: 30335
[15:28:05.636432] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4881 (0.6496)  acc1: 87.5000 (82.8642)  acc5: 95.8333 (96.7383)  time: 0.2565  data: 0.0002  max mem: 30335
[15:30:14.091903] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2646 (0.6876)  acc1: 91.6667 (82.0361)  acc5: 100.0000 (96.3227)  time: 0.2572  data: 0.0002  max mem: 30335
[15:30:35.257754] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2651 (0.6928)  acc1: 91.6667 (81.8980)  acc5: 100.0000 (96.3100)  time: 0.2489  data: 0.0002  max mem: 30335
[15:30:35.371689] Test: Total time: 0:09:00 (0.2594 s / it)
[15:30:50.906924] * Acc@1 81.921 Acc@5 96.301 loss 0.693
[15:30:50.907224] Accuracy of the network on the 50000 test images: 81.9%
[15:30:50.907257] Max accuracy: 81.92%
[15:30:51.019130] log_dir: ./output_dir_qkformer
[15:30:57.151542] Epoch: [143]  [   0/6672]  eta: 11:17:11  lr: 0.000266  loss: 2.4748 (2.4748)  time: 6.0898  data: 1.9850  max mem: 30335
[15:55:58.734718] Epoch: [143]  [2000/6672]  eta: 0:58:39  lr: 0.000263  loss: 2.1584 (2.2228)  time: 0.7357  data: 0.0002  max mem: 30335
[16:21:12.823453] Epoch: [143]  [4000/6672]  eta: 0:33:37  lr: 0.000261  loss: 2.3346 (2.2258)  time: 0.7279  data: 0.0003  max mem: 30335
[16:46:22.319457] Epoch: [143]  [6000/6672]  eta: 0:08:27  lr: 0.000258  loss: 2.3143 (2.2351)  time: 0.7287  data: 0.0003  max mem: 30335
[16:54:40.760389] Epoch: [143]  [6671/6672]  eta: 0:00:00  lr: 0.000257  loss: 2.1059 (2.2348)  time: 0.7245  data: 0.0006  max mem: 30335
[16:54:41.512947] Epoch: [143] Total time: 1:23:50 (0.7540 s / it)
[16:54:41.583806] Averaged stats: lr: 0.000257  loss: 2.1059 (2.2312)
[16:54:46.334354] Test:  [   0/2084]  eta: 2:44:47  loss: 0.3057 (0.3057)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7447  data: 3.9806  max mem: 30335
[16:56:54.682005] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.8040 (0.5396)  acc1: 75.0000 (85.8367)  acc5: 95.8333 (97.6796)  time: 0.2564  data: 0.0002  max mem: 30335
[16:59:04.646419] Test:  [1000/2084]  eta: 0:04:44  loss: 0.7164 (0.5902)  acc1: 83.3333 (84.2616)  acc5: 95.8333 (97.3735)  time: 0.2563  data: 0.0002  max mem: 30335
[17:01:13.375850] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5498 (0.6605)  acc1: 87.5000 (82.6616)  acc5: 95.8333 (96.6411)  time: 0.2559  data: 0.0002  max mem: 30335
[17:03:21.862856] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2578 (0.6928)  acc1: 95.8333 (81.9028)  acc5: 100.0000 (96.2810)  time: 0.2568  data: 0.0002  max mem: 30335
[17:03:43.027129] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4022 (0.6973)  acc1: 91.6667 (81.7940)  acc5: 100.0000 (96.2600)  time: 0.2496  data: 0.0002  max mem: 30335
[17:03:43.147608] Test: Total time: 0:09:01 (0.2599 s / it)
[17:03:58.527158] * Acc@1 81.809 Acc@5 96.255 loss 0.697
[17:03:58.527644] Accuracy of the network on the 50000 test images: 81.8%
[17:03:58.527677] Max accuracy: 81.92%
[17:03:58.963412] log_dir: ./output_dir_qkformer
[17:04:03.431401] Epoch: [144]  [   0/6672]  eta: 8:16:44  lr: 0.000257  loss: 2.4100 (2.4100)  time: 4.4671  data: 2.2372  max mem: 30335
[17:28:54.556575] Epoch: [144]  [2000/6672]  eta: 0:58:11  lr: 0.000255  loss: 2.2236 (2.2163)  time: 0.7294  data: 0.0002  max mem: 30335
[17:53:58.234299] Epoch: [144]  [4000/6672]  eta: 0:33:22  lr: 0.000252  loss: 2.1214 (2.2123)  time: 0.7269  data: 0.0002  max mem: 30335
[18:18:39.642025] Epoch: [144]  [6000/6672]  eta: 0:08:21  lr: 0.000250  loss: 2.1924 (2.2217)  time: 0.7282  data: 0.0003  max mem: 30335
[18:27:01.937485] Epoch: [144]  [6671/6672]  eta: 0:00:00  lr: 0.000249  loss: 2.4433 (2.2225)  time: 0.7324  data: 0.0013  max mem: 30335
[18:27:02.760412] Epoch: [144] Total time: 1:23:03 (0.7470 s / it)
[18:27:02.841586] Averaged stats: lr: 0.000249  loss: 2.4433 (2.2196)
[18:27:07.480209] Test:  [   0/2084]  eta: 2:40:56  loss: 0.4178 (0.4178)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6335  data: 4.0968  max mem: 30335
[18:29:15.813370] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5671 (0.5418)  acc1: 83.3333 (86.3190)  acc5: 100.0000 (97.7794)  time: 0.2575  data: 0.0002  max mem: 30335
[18:31:24.924356] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5021 (0.5835)  acc1: 87.5000 (84.7777)  acc5: 95.8333 (97.5150)  time: 0.2566  data: 0.0002  max mem: 30335
[18:33:33.224666] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4495 (0.6561)  acc1: 83.3333 (83.0668)  acc5: 95.8333 (96.7549)  time: 0.2556  data: 0.0002  max mem: 30335
[18:35:41.596556] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2202 (0.6899)  acc1: 95.8333 (82.2401)  acc5: 100.0000 (96.3997)  time: 0.2561  data: 0.0002  max mem: 30335
[18:36:02.733611] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2830 (0.6938)  acc1: 91.6667 (82.1460)  acc5: 100.0000 (96.3880)  time: 0.2493  data: 0.0002  max mem: 30335
[18:36:02.847544] Test: Total time: 0:09:00 (0.2591 s / it)
[18:36:18.493790] * Acc@1 82.155 Acc@5 96.395 loss 0.694
[18:36:18.494054] Accuracy of the network on the 50000 test images: 82.2%
[18:36:18.494094] Max accuracy: 82.16%
[18:36:18.558616] log_dir: ./output_dir_qkformer
[18:36:24.758015] Epoch: [145]  [   0/6672]  eta: 11:23:04  lr: 0.000249  loss: 1.6774 (1.6774)  time: 6.1427  data: 3.1921  max mem: 30335
[19:01:08.089851] Epoch: [145]  [2000/6672]  eta: 0:57:56  lr: 0.000246  loss: 2.1220 (2.1862)  time: 0.7446  data: 0.0003  max mem: 30335
[19:25:52.164737] Epoch: [145]  [4000/6672]  eta: 0:33:05  lr: 0.000244  loss: 2.1052 (2.2028)  time: 0.7257  data: 0.0002  max mem: 30335
[19:50:40.972235] Epoch: [145]  [6000/6672]  eta: 0:08:19  lr: 0.000241  loss: 2.0389 (2.2057)  time: 0.7307  data: 0.0003  max mem: 30335
[19:59:07.332169] Epoch: [145]  [6671/6672]  eta: 0:00:00  lr: 0.000241  loss: 2.2543 (2.2068)  time: 0.7299  data: 0.0011  max mem: 30335
[19:59:08.032499] Epoch: [145] Total time: 1:22:49 (0.7448 s / it)
[19:59:08.051273] Averaged stats: lr: 0.000241  loss: 2.2543 (2.2125)
[19:59:12.474059] Test:  [   0/2084]  eta: 2:32:56  loss: 0.2221 (0.2221)  acc1: 95.8333 (95.8333)  acc5: 100.0000 (100.0000)  time: 4.4035  data: 3.7399  max mem: 30335
[20:01:20.877635] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.5439 (0.5322)  acc1: 83.3333 (86.0446)  acc5: 100.0000 (97.7794)  time: 0.2574  data: 0.0002  max mem: 30335
[20:03:29.415037] Test:  [1000/2084]  eta: 0:04:42  loss: 0.4502 (0.5746)  acc1: 87.5000 (84.7194)  acc5: 95.8333 (97.4858)  time: 0.2569  data: 0.0002  max mem: 30335
[20:05:39.366619] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5631 (0.6474)  acc1: 83.3333 (82.8448)  acc5: 95.8333 (96.6744)  time: 0.2562  data: 0.0002  max mem: 30335
[20:07:47.778148] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2445 (0.6789)  acc1: 91.6667 (82.0715)  acc5: 100.0000 (96.3227)  time: 0.2575  data: 0.0002  max mem: 30335
[20:08:08.923929] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3394 (0.6815)  acc1: 87.5000 (81.9820)  acc5: 100.0000 (96.3380)  time: 0.2483  data: 0.0001  max mem: 30335
[20:08:09.056290] Test: Total time: 0:09:00 (0.2596 s / it)
[20:08:24.402873] * Acc@1 81.995 Acc@5 96.358 loss 0.681
[20:08:24.403133] Accuracy of the network on the 50000 test images: 82.0%
[20:08:24.403172] Max accuracy: 82.16%
[20:08:24.800713] log_dir: ./output_dir_qkformer
[20:08:32.139217] Epoch: [146]  [   0/6672]  eta: 13:27:16  lr: 0.000241  loss: 1.9484 (1.9484)  time: 7.2597  data: 2.2803  max mem: 30335
[20:33:16.160582] Epoch: [146]  [2000/6672]  eta: 0:58:01  lr: 0.000238  loss: 2.0996 (2.2002)  time: 0.7269  data: 0.0002  max mem: 30335
[20:58:09.076909] Epoch: [146]  [4000/6672]  eta: 0:33:12  lr: 0.000236  loss: 2.1674 (2.2004)  time: 0.7250  data: 0.0003  max mem: 30335
[21:23:07.770856] Epoch: [146]  [6000/6672]  eta: 0:08:21  lr: 0.000233  loss: 2.2133 (2.2022)  time: 0.7303  data: 0.0002  max mem: 30335
[21:31:34.776949] Epoch: [146]  [6671/6672]  eta: 0:00:00  lr: 0.000232  loss: 2.0366 (2.2007)  time: 0.7259  data: 0.0010  max mem: 30335
[21:31:35.543117] Epoch: [146] Total time: 1:23:10 (0.7480 s / it)
[21:31:35.645548] Averaged stats: lr: 0.000232  loss: 2.0366 (2.1998)
[21:31:40.486075] Test:  [   0/2084]  eta: 2:47:59  loss: 0.3268 (0.3268)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.8364  data: 3.9415  max mem: 30335
[21:33:48.990432] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5565 (0.5433)  acc1: 87.5000 (85.7951)  acc5: 100.0000 (97.9291)  time: 0.2575  data: 0.0002  max mem: 30335
[21:35:57.645990] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5594 (0.5844)  acc1: 87.5000 (84.4364)  acc5: 95.8333 (97.7065)  time: 0.2563  data: 0.0002  max mem: 30335
[21:38:06.420180] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4577 (0.6574)  acc1: 87.5000 (82.9086)  acc5: 95.8333 (96.7827)  time: 0.2561  data: 0.0002  max mem: 30335
[21:40:15.460408] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2596 (0.6946)  acc1: 95.8333 (81.9340)  acc5: 100.0000 (96.3914)  time: 0.2569  data: 0.0002  max mem: 30335
[21:40:36.638064] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2754 (0.6977)  acc1: 91.6667 (81.8500)  acc5: 100.0000 (96.4000)  time: 0.2491  data: 0.0001  max mem: 30335
[21:40:36.783114] Test: Total time: 0:09:01 (0.2597 s / it)
[21:40:52.394555] * Acc@1 81.854 Acc@5 96.393 loss 0.698
[21:40:52.394818] Accuracy of the network on the 50000 test images: 81.9%
[21:40:52.394852] Max accuracy: 82.16%
[21:40:52.754339] log_dir: ./output_dir_qkformer
[21:40:59.871389] Epoch: [147]  [   0/6672]  eta: 13:11:16  lr: 0.000232  loss: 1.9152 (1.9152)  time: 7.1158  data: 2.5009  max mem: 30335
[22:05:54.820902] Epoch: [147]  [2000/6672]  eta: 0:58:26  lr: 0.000230  loss: 2.0653 (2.1724)  time: 0.7337  data: 0.0003  max mem: 30335
[22:30:28.816112] Epoch: [147]  [4000/6672]  eta: 0:33:07  lr: 0.000227  loss: 2.0457 (2.1760)  time: 0.9012  data: 0.0002  max mem: 30335
[22:55:08.447282] Epoch: [147]  [6000/6672]  eta: 0:08:18  lr: 0.000225  loss: 2.1132 (2.1839)  time: 0.7296  data: 0.0002  max mem: 30335
[23:03:22.476375] Epoch: [147]  [6671/6672]  eta: 0:00:00  lr: 0.000224  loss: 2.2887 (2.1871)  time: 0.7230  data: 0.0010  max mem: 30335
[23:03:23.465947] Epoch: [147] Total time: 1:22:30 (0.7420 s / it)
[23:03:23.527025] Averaged stats: lr: 0.000224  loss: 2.2887 (2.1925)
[23:03:28.160837] Test:  [   0/2084]  eta: 2:40:46  loss: 0.4415 (0.4415)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.6289  data: 3.9201  max mem: 30335
[23:05:36.979222] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5147 (0.5429)  acc1: 87.5000 (86.1361)  acc5: 100.0000 (97.6713)  time: 0.2565  data: 0.0002  max mem: 30335
[23:07:46.303830] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5814 (0.5798)  acc1: 87.5000 (85.0150)  acc5: 95.8333 (97.4525)  time: 0.2566  data: 0.0002  max mem: 30335
[23:09:55.244145] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5104 (0.6447)  acc1: 87.5000 (83.3084)  acc5: 95.8333 (96.7716)  time: 0.2585  data: 0.0002  max mem: 30335
[23:12:03.557295] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2249 (0.6809)  acc1: 95.8333 (82.4379)  acc5: 100.0000 (96.4455)  time: 0.2564  data: 0.0002  max mem: 30335
[23:12:24.704974] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4338 (0.6871)  acc1: 91.6667 (82.2760)  acc5: 100.0000 (96.4240)  time: 0.2488  data: 0.0001  max mem: 30335
[23:12:24.839001] Test: Total time: 0:09:01 (0.2597 s / it)
[23:12:40.645904] * Acc@1 82.280 Acc@5 96.418 loss 0.687
[23:12:40.646236] Accuracy of the network on the 50000 test images: 82.3%
[23:12:40.646293] Max accuracy: 82.28%
[23:12:41.031995] log_dir: ./output_dir_qkformer
[23:12:44.765149] Epoch: [148]  [   0/6672]  eta: 6:53:30  lr: 0.000224  loss: 2.3688 (2.3688)  time: 3.7186  data: 2.3525  max mem: 30335
[23:37:28.078008] Epoch: [148]  [2000/6672]  eta: 0:57:51  lr: 0.000222  loss: 2.0641 (2.1739)  time: 0.7470  data: 0.0002  max mem: 30335
[00:02:09.323769] Epoch: [148]  [4000/6672]  eta: 0:33:01  lr: 0.000219  loss: 2.0456 (2.1752)  time: 0.7295  data: 0.0002  max mem: 30335
[00:26:44.852184] Epoch: [148]  [6000/6672]  eta: 0:08:17  lr: 0.000217  loss: 2.2483 (2.1837)  time: 0.7284  data: 0.0002  max mem: 30335
[00:35:02.014887] Epoch: [148]  [6671/6672]  eta: 0:00:00  lr: 0.000216  loss: 2.0662 (2.1836)  time: 0.7240  data: 0.0005  max mem: 30335
[00:35:02.659905] Epoch: [148] Total time: 1:22:21 (0.7407 s / it)
[00:35:02.765080] Averaged stats: lr: 0.000216  loss: 2.0662 (2.1830)
[00:35:07.997281] Test:  [   0/2084]  eta: 3:01:34  loss: 0.3129 (0.3129)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.2277  data: 4.6801  max mem: 30335
[00:37:18.883605] Test:  [ 500/2084]  eta: 0:07:10  loss: 0.4743 (0.5089)  acc1: 83.3333 (86.6101)  acc5: 100.0000 (97.9291)  time: 0.2566  data: 0.0002  max mem: 30335
[00:39:27.482514] Test:  [1000/2084]  eta: 0:04:46  loss: 0.4960 (0.5550)  acc1: 87.5000 (85.2564)  acc5: 95.8333 (97.6190)  time: 0.2562  data: 0.0002  max mem: 30335
[00:41:36.507217] Test:  [1500/2084]  eta: 0:02:33  loss: 0.5451 (0.6302)  acc1: 83.3333 (83.3639)  acc5: 95.8333 (96.8715)  time: 0.2562  data: 0.0002  max mem: 30335
[00:43:45.290097] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1762 (0.6616)  acc1: 91.6667 (82.5629)  acc5: 100.0000 (96.5517)  time: 0.2557  data: 0.0002  max mem: 30335
[00:44:06.440129] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3785 (0.6645)  acc1: 91.6667 (82.4840)  acc5: 100.0000 (96.5560)  time: 0.2494  data: 0.0001  max mem: 30335
[00:44:06.551343] Test: Total time: 0:09:03 (0.2609 s / it)
[00:44:22.404261] * Acc@1 82.503 Acc@5 96.563 loss 0.665
[00:44:22.404625] Accuracy of the network on the 50000 test images: 82.5%
[00:44:22.404659] Max accuracy: 82.50%
[00:44:23.038494] log_dir: ./output_dir_qkformer
[00:44:26.708743] Epoch: [149]  [   0/6672]  eta: 6:46:19  lr: 0.000216  loss: 1.6618 (1.6618)  time: 3.6540  data: 1.8395  max mem: 30335
[01:09:07.917867] Epoch: [149]  [2000/6672]  eta: 0:57:46  lr: 0.000214  loss: 2.0678 (2.1628)  time: 0.7264  data: 0.0002  max mem: 30335
[01:33:47.073004] Epoch: [149]  [4000/6672]  eta: 0:32:59  lr: 0.000211  loss: 2.0639 (2.1672)  time: 0.7431  data: 0.0004  max mem: 30335
[01:58:25.102752] Epoch: [149]  [6000/6672]  eta: 0:08:17  lr: 0.000209  loss: 2.1655 (2.1722)  time: 0.7269  data: 0.0003  max mem: 30335
[02:06:46.369602] Epoch: [149]  [6671/6672]  eta: 0:00:00  lr: 0.000208  loss: 2.0904 (2.1709)  time: 0.7228  data: 0.0006  max mem: 30335
[02:06:47.042072] Epoch: [149] Total time: 1:22:24 (0.7410 s / it)
[02:06:47.104876] Averaged stats: lr: 0.000208  loss: 2.0904 (2.1737)
[02:06:52.323719] Test:  [   0/2084]  eta: 3:01:07  loss: 0.3913 (0.3913)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.2145  data: 4.4069  max mem: 30335
[02:09:00.720980] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.5223 (0.5127)  acc1: 83.3333 (86.8263)  acc5: 100.0000 (98.0872)  time: 0.2579  data: 0.0002  max mem: 30335
[02:11:09.045318] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5536 (0.5537)  acc1: 87.5000 (85.5062)  acc5: 95.8333 (97.8397)  time: 0.2567  data: 0.0002  max mem: 30335
[02:13:17.385048] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5957 (0.6281)  acc1: 87.5000 (83.6054)  acc5: 95.8333 (97.1047)  time: 0.2563  data: 0.0002  max mem: 30335
[02:15:26.023242] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3501 (0.6692)  acc1: 91.6667 (82.6212)  acc5: 100.0000 (96.6121)  time: 0.2574  data: 0.0002  max mem: 30335
[02:15:47.155117] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4092 (0.6732)  acc1: 87.5000 (82.5160)  acc5: 100.0000 (96.6060)  time: 0.2489  data: 0.0001  max mem: 30335
[02:15:47.268024] Test: Total time: 0:09:00 (0.2592 s / it)
[02:16:02.843393] * Acc@1 82.530 Acc@5 96.624 loss 0.673
[02:16:02.843613] Accuracy of the network on the 50000 test images: 82.5%
[02:16:02.843648] Max accuracy: 82.53%
[02:16:02.975141] log_dir: ./output_dir_qkformer
[02:16:07.497392] Epoch: [150]  [   0/6672]  eta: 8:22:15  lr: 0.000208  loss: 2.3373 (2.3373)  time: 4.5167  data: 2.5531  max mem: 30335
[02:40:54.185872] Epoch: [150]  [2000/6672]  eta: 0:58:01  lr: 0.000206  loss: 2.2025 (2.1482)  time: 0.7272  data: 0.0002  max mem: 30335
[03:05:26.126796] Epoch: [150]  [4000/6672]  eta: 0:32:58  lr: 0.000204  loss: 2.1927 (2.1607)  time: 0.7240  data: 0.0002  max mem: 30335
[03:30:04.086282] Epoch: [150]  [6000/6672]  eta: 0:08:17  lr: 0.000201  loss: 2.1262 (2.1608)  time: 0.7275  data: 0.0003  max mem: 30335
[03:38:14.422106] Epoch: [150]  [6671/6672]  eta: 0:00:00  lr: 0.000200  loss: 2.1367 (2.1618)  time: 0.7279  data: 0.0006  max mem: 30335
[03:38:15.179780] Epoch: [150] Total time: 1:22:12 (0.7392 s / it)
[03:38:15.238539] Averaged stats: lr: 0.000200  loss: 2.1367 (2.1623)
[03:38:20.487064] Test:  [   0/2084]  eta: 3:02:08  loss: 0.3420 (0.3420)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.2439  data: 4.7566  max mem: 30335
[03:40:28.870332] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.7049 (0.5195)  acc1: 79.1667 (86.8679)  acc5: 100.0000 (97.9541)  time: 0.2564  data: 0.0002  max mem: 30335
[03:42:37.296491] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4191 (0.5728)  acc1: 91.6667 (85.1482)  acc5: 95.8333 (97.6690)  time: 0.2570  data: 0.0002  max mem: 30335
[03:44:45.921598] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5800 (0.6421)  acc1: 87.5000 (83.4277)  acc5: 95.8333 (96.8882)  time: 0.2575  data: 0.0002  max mem: 30335
[03:46:54.340349] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1744 (0.6773)  acc1: 95.8333 (82.5629)  acc5: 100.0000 (96.5434)  time: 0.2561  data: 0.0002  max mem: 30335
[03:47:15.451199] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3244 (0.6812)  acc1: 91.6667 (82.4560)  acc5: 100.0000 (96.5440)  time: 0.2490  data: 0.0001  max mem: 30335
[03:47:15.566955] Test: Total time: 0:09:00 (0.2593 s / it)
[03:47:30.940193] * Acc@1 82.460 Acc@5 96.535 loss 0.681
[03:47:30.940499] Accuracy of the network on the 50000 test images: 82.5%
[03:47:30.940563] Max accuracy: 82.53%
[03:47:31.044611] log_dir: ./output_dir_qkformer
[03:47:37.593245] Epoch: [151]  [   0/6672]  eta: 11:51:32  lr: 0.000200  loss: 2.1917 (2.1917)  time: 6.3987  data: 3.2410  max mem: 30335
[04:12:13.202011] Epoch: [151]  [2000/6672]  eta: 0:57:39  lr: 0.000198  loss: 1.9318 (2.1485)  time: 0.7439  data: 0.0004  max mem: 30335
[04:36:55.810999] Epoch: [151]  [4000/6672]  eta: 0:32:59  lr: 0.000196  loss: 2.1031 (2.1436)  time: 0.7257  data: 0.0003  max mem: 30335
[05:01:38.144680] Epoch: [151]  [6000/6672]  eta: 0:08:17  lr: 0.000194  loss: 1.9124 (2.1458)  time: 0.7289  data: 0.0003  max mem: 30335
[05:09:53.998529] Epoch: [151]  [6671/6672]  eta: 0:00:00  lr: 0.000193  loss: 2.0722 (2.1485)  time: 0.7253  data: 0.0006  max mem: 30335
[05:09:54.724487] Epoch: [151] Total time: 1:22:23 (0.7410 s / it)
[05:09:54.755328] Averaged stats: lr: 0.000193  loss: 2.0722 (2.1524)
[05:09:59.139879] Test:  [   0/2084]  eta: 2:32:03  loss: 0.4153 (0.4153)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.3780  data: 3.7865  max mem: 30335
[05:12:08.288257] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.6272 (0.5197)  acc1: 79.1667 (86.6184)  acc5: 100.0000 (97.9874)  time: 0.2568  data: 0.0002  max mem: 30335
[05:14:16.598483] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5795 (0.5644)  acc1: 87.5000 (85.1607)  acc5: 95.8333 (97.7106)  time: 0.2561  data: 0.0002  max mem: 30335
[05:16:24.936537] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4856 (0.6335)  acc1: 87.5000 (83.3195)  acc5: 95.8333 (97.0103)  time: 0.2559  data: 0.0002  max mem: 30335
[05:18:33.299003] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2618 (0.6675)  acc1: 95.8333 (82.4317)  acc5: 100.0000 (96.5913)  time: 0.2569  data: 0.0002  max mem: 30335
[05:18:54.505877] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4051 (0.6726)  acc1: 91.6667 (82.2960)  acc5: 100.0000 (96.5760)  time: 0.2497  data: 0.0001  max mem: 30335
[05:18:54.614995] Test: Total time: 0:08:59 (0.2590 s / it)
[05:19:10.117136] * Acc@1 82.313 Acc@5 96.576 loss 0.673
[05:19:10.117366] Accuracy of the network on the 50000 test images: 82.3%
[05:19:10.117395] Max accuracy: 82.53%
[05:19:10.211857] log_dir: ./output_dir_qkformer
[05:19:13.787321] Epoch: [152]  [   0/6672]  eta: 6:30:36  lr: 0.000193  loss: 2.6987 (2.6987)  time: 3.5126  data: 2.4797  max mem: 30335
[05:43:50.778598] Epoch: [152]  [2000/6672]  eta: 0:57:36  lr: 0.000191  loss: 2.2064 (2.1322)  time: 0.7280  data: 0.0003  max mem: 30335
[06:08:31.967041] Epoch: [152]  [4000/6672]  eta: 0:32:57  lr: 0.000188  loss: 2.1388 (2.1399)  time: 0.7275  data: 0.0003  max mem: 30335
[06:33:12.554408] Epoch: [152]  [6000/6672]  eta: 0:08:17  lr: 0.000186  loss: 1.8772 (2.1407)  time: 0.7299  data: 0.0002  max mem: 30335
[06:41:35.335901] Epoch: [152]  [6671/6672]  eta: 0:00:00  lr: 0.000185  loss: 1.9520 (2.1424)  time: 0.7254  data: 0.0010  max mem: 30335
[06:41:36.127068] Epoch: [152] Total time: 1:22:25 (0.7413 s / it)
[06:41:36.138273] Averaged stats: lr: 0.000185  loss: 1.9520 (2.1432)
[06:41:40.525439] Test:  [   0/2084]  eta: 2:31:54  loss: 0.3552 (0.3552)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.3734  data: 3.7235  max mem: 30335
[06:43:50.001910] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.5094 (0.4987)  acc1: 83.3333 (86.8430)  acc5: 100.0000 (98.0622)  time: 0.2565  data: 0.0002  max mem: 30335
[06:45:58.405508] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5031 (0.5457)  acc1: 87.5000 (85.5603)  acc5: 95.8333 (97.7606)  time: 0.2555  data: 0.0002  max mem: 30335
[06:48:07.199111] Test:  [1500/2084]  eta: 0:02:32  loss: 0.5407 (0.6130)  acc1: 87.5000 (83.7997)  acc5: 95.8333 (97.0520)  time: 0.2560  data: 0.0002  max mem: 30335
[06:50:15.510975] Test:  [2000/2084]  eta: 0:00:21  loss: 0.3093 (0.6473)  acc1: 91.6667 (82.8273)  acc5: 100.0000 (96.7100)  time: 0.2568  data: 0.0002  max mem: 30335
[06:50:36.654287] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3037 (0.6516)  acc1: 91.6667 (82.7200)  acc5: 100.0000 (96.6940)  time: 0.2489  data: 0.0001  max mem: 30335
[06:50:36.792446] Test: Total time: 0:09:00 (0.2594 s / it)
[06:50:52.214414] * Acc@1 82.702 Acc@5 96.699 loss 0.652
[06:50:52.214927] Accuracy of the network on the 50000 test images: 82.7%
[06:50:52.214964] Max accuracy: 82.70%
[06:50:52.389049] log_dir: ./output_dir_qkformer
[06:50:55.620286] Epoch: [153]  [   0/6672]  eta: 5:59:09  lr: 0.000185  loss: 2.4182 (2.4182)  time: 3.2299  data: 2.2068  max mem: 30335
[07:15:40.140067] Epoch: [153]  [2000/6672]  eta: 0:57:52  lr: 0.000183  loss: 2.2223 (2.1319)  time: 0.7329  data: 0.0004  max mem: 30335
[07:40:24.949885] Epoch: [153]  [4000/6672]  eta: 0:33:04  lr: 0.000181  loss: 2.1798 (2.1367)  time: 0.7326  data: 0.0002  max mem: 30335
[08:05:15.029847] Epoch: [153]  [6000/6672]  eta: 0:08:19  lr: 0.000179  loss: 2.0371 (2.1362)  time: 0.7270  data: 0.0003  max mem: 30335
[08:13:36.201270] Epoch: [153]  [6671/6672]  eta: 0:00:00  lr: 0.000178  loss: 2.1576 (2.1384)  time: 0.7242  data: 0.0010  max mem: 30335
[08:13:36.850997] Epoch: [153] Total time: 1:22:44 (0.7441 s / it)
[08:13:36.932642] Averaged stats: lr: 0.000178  loss: 2.1576 (2.1361)
[08:13:39.892504] Test:  [   0/2084]  eta: 1:42:33  loss: 0.4136 (0.4136)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 2.9528  data: 2.4668  max mem: 30335
[08:15:48.237273] Test:  [ 500/2084]  eta: 0:06:55  loss: 0.6240 (0.4993)  acc1: 83.3333 (87.1673)  acc5: 100.0000 (98.0705)  time: 0.2571  data: 0.0002  max mem: 30335
[08:17:56.634866] Test:  [1000/2084]  eta: 0:04:41  loss: 0.4329 (0.5515)  acc1: 87.5000 (85.3605)  acc5: 95.8333 (97.7939)  time: 0.2560  data: 0.0002  max mem: 30335
[08:20:05.103676] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4562 (0.6270)  acc1: 87.5000 (83.4999)  acc5: 95.8333 (96.9687)  time: 0.2562  data: 0.0002  max mem: 30335
[08:22:13.656767] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2564 (0.6630)  acc1: 95.8333 (82.6420)  acc5: 100.0000 (96.5809)  time: 0.2560  data: 0.0002  max mem: 30335
[08:22:34.820561] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3476 (0.6672)  acc1: 91.6667 (82.5440)  acc5: 100.0000 (96.5640)  time: 0.2501  data: 0.0001  max mem: 30335
[08:22:34.932814] Test: Total time: 0:08:57 (0.2582 s / it)
[08:22:50.407278] * Acc@1 82.534 Acc@5 96.569 loss 0.668
[08:22:50.407675] Accuracy of the network on the 50000 test images: 82.5%
[08:22:50.407704] Max accuracy: 82.70%
[08:22:50.469983] log_dir: ./output_dir_qkformer
[08:22:55.240556] Epoch: [154]  [   0/6672]  eta: 8:45:58  lr: 0.000178  loss: 1.7932 (1.7932)  time: 4.7300  data: 3.2472  max mem: 30335
[08:47:38.719166] Epoch: [154]  [2000/6672]  eta: 0:57:53  lr: 0.000176  loss: 2.0271 (2.1168)  time: 0.7310  data: 0.0003  max mem: 30335
[09:12:38.978714] Epoch: [154]  [4000/6672]  eta: 0:33:15  lr: 0.000174  loss: 2.0758 (2.1180)  time: 0.9311  data: 0.0017  max mem: 30335
[09:37:34.756455] Epoch: [154]  [6000/6672]  eta: 0:08:22  lr: 0.000171  loss: 2.1894 (2.1232)  time: 0.9598  data: 0.0003  max mem: 30335
[09:46:04.758593] Epoch: [154]  [6671/6672]  eta: 0:00:00  lr: 0.000171  loss: 2.1006 (2.1256)  time: 0.7551  data: 0.0010  max mem: 30335
[09:46:05.562243] Epoch: [154] Total time: 1:23:15 (0.7487 s / it)
[09:46:05.621363] Averaged stats: lr: 0.000171  loss: 2.1006 (2.1244)
[09:46:10.629317] Test:  [   0/2084]  eta: 2:53:46  loss: 0.4272 (0.4272)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.0034  data: 4.2495  max mem: 30335
[09:48:19.493080] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.5389 (0.5088)  acc1: 83.3333 (86.5685)  acc5: 100.0000 (98.1454)  time: 0.2576  data: 0.0002  max mem: 30335
[09:50:28.067886] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5302 (0.5504)  acc1: 87.5000 (85.4687)  acc5: 95.8333 (97.7855)  time: 0.2563  data: 0.0002  max mem: 30335
[09:52:37.371222] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4620 (0.6246)  acc1: 87.5000 (83.7331)  acc5: 95.8333 (97.0325)  time: 0.2562  data: 0.0002  max mem: 30335
[09:54:46.933730] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2507 (0.6594)  acc1: 95.8333 (82.7982)  acc5: 100.0000 (96.6808)  time: 0.2565  data: 0.0002  max mem: 30335
[09:55:08.139741] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3130 (0.6617)  acc1: 91.6667 (82.7340)  acc5: 100.0000 (96.6880)  time: 0.2506  data: 0.0002  max mem: 30335
[09:55:08.265456] Test: Total time: 0:09:02 (0.2604 s / it)
[09:55:23.105526] * Acc@1 82.723 Acc@5 96.684 loss 0.662
[09:55:23.105790] Accuracy of the network on the 50000 test images: 82.7%
[09:55:23.105837] Max accuracy: 82.72%
[09:55:23.247197] log_dir: ./output_dir_qkformer
[09:55:27.780353] Epoch: [155]  [   0/6672]  eta: 8:17:48  lr: 0.000171  loss: 1.9138 (1.9138)  time: 4.4767  data: 2.6327  max mem: 30335
[10:20:47.860642] Epoch: [155]  [2000/6672]  eta: 0:59:18  lr: 0.000168  loss: 2.1829 (2.1045)  time: 0.7457  data: 0.0003  max mem: 30335
[10:46:01.455937] Epoch: [155]  [4000/6672]  eta: 0:33:48  lr: 0.000166  loss: 2.1501 (2.1134)  time: 0.7523  data: 0.0004  max mem: 30335
[11:11:00.646863] Epoch: [155]  [6000/6672]  eta: 0:08:27  lr: 0.000164  loss: 2.0166 (2.1194)  time: 0.7269  data: 0.0002  max mem: 30335
[11:19:27.813851] Epoch: [155]  [6671/6672]  eta: 0:00:00  lr: 0.000163  loss: 2.1072 (2.1197)  time: 0.7397  data: 0.0025  max mem: 30335
[11:19:28.630550] Epoch: [155] Total time: 1:24:05 (0.7562 s / it)
[11:19:28.663011] Averaged stats: lr: 0.000163  loss: 2.1072 (2.1161)
[11:19:33.017507] Test:  [   0/2084]  eta: 2:30:16  loss: 0.4437 (0.4437)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.3265  data: 3.6012  max mem: 30335
[11:21:42.019369] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.6183 (0.5144)  acc1: 79.1667 (86.5519)  acc5: 100.0000 (98.0373)  time: 0.2567  data: 0.0002  max mem: 30335
[11:23:50.978017] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4478 (0.5605)  acc1: 87.5000 (85.1066)  acc5: 95.8333 (97.6856)  time: 0.2572  data: 0.0002  max mem: 30335
[11:26:01.198385] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4616 (0.6212)  acc1: 87.5000 (83.6387)  acc5: 95.8333 (97.0103)  time: 0.2569  data: 0.0002  max mem: 30335
[11:28:13.715028] Test:  [2000/2084]  eta: 0:00:22  loss: 0.1645 (0.6576)  acc1: 95.8333 (82.7461)  acc5: 100.0000 (96.6142)  time: 0.2562  data: 0.0002  max mem: 30335
[11:28:34.894185] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3876 (0.6589)  acc1: 91.6667 (82.7180)  acc5: 100.0000 (96.6320)  time: 0.2505  data: 0.0001  max mem: 30335
[11:28:35.020064] Test: Total time: 0:09:06 (0.2622 s / it)
[11:28:50.015173] * Acc@1 82.730 Acc@5 96.643 loss 0.659
[11:28:50.015416] Accuracy of the network on the 50000 test images: 82.7%
[11:28:50.015459] Max accuracy: 82.73%
[11:28:50.370357] log_dir: ./output_dir_qkformer
[11:28:59.693683] Epoch: [156]  [   0/6672]  eta: 17:15:59  lr: 0.000163  loss: 1.6960 (1.6960)  time: 9.3164  data: 3.4387  max mem: 30335
[11:54:04.756937] Epoch: [156]  [2000/6672]  eta: 0:58:55  lr: 0.000161  loss: 2.0092 (2.1022)  time: 0.7264  data: 0.0003  max mem: 30335
[12:19:10.235503] Epoch: [156]  [4000/6672]  eta: 0:33:36  lr: 0.000159  loss: 2.0674 (2.0976)  time: 0.7231  data: 0.0002  max mem: 30335
[12:44:12.669344] Epoch: [156]  [6000/6672]  eta: 0:08:26  lr: 0.000157  loss: 1.9795 (2.1016)  time: 0.8130  data: 0.0027  max mem: 30335
[12:52:42.237837] Epoch: [156]  [6671/6672]  eta: 0:00:00  lr: 0.000156  loss: 2.0170 (2.1019)  time: 0.7229  data: 0.0011  max mem: 30335
[12:52:43.054978] Epoch: [156] Total time: 1:23:52 (0.7543 s / it)
[12:52:43.071574] Averaged stats: lr: 0.000156  loss: 2.0170 (2.1044)
[12:52:47.398498] Test:  [   0/2084]  eta: 2:30:08  loss: 0.2225 (0.2225)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.3225  data: 3.5770  max mem: 30335
[12:54:56.366649] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5853 (0.4949)  acc1: 83.3333 (87.2422)  acc5: 100.0000 (98.0705)  time: 0.2678  data: 0.0002  max mem: 30335
[12:57:05.514435] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5797 (0.5432)  acc1: 87.5000 (85.6102)  acc5: 95.8333 (97.7647)  time: 0.2565  data: 0.0002  max mem: 30335
[12:59:13.883152] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4303 (0.6103)  acc1: 91.6667 (83.9718)  acc5: 95.8333 (97.0825)  time: 0.2562  data: 0.0002  max mem: 30335
[13:01:22.697582] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2343 (0.6474)  acc1: 91.6667 (83.0876)  acc5: 100.0000 (96.6704)  time: 0.2567  data: 0.0002  max mem: 30335
[13:01:43.829287] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3608 (0.6520)  acc1: 91.6667 (82.9740)  acc5: 100.0000 (96.6440)  time: 0.2481  data: 0.0002  max mem: 30335
[13:01:43.970635] Test: Total time: 0:09:00 (0.2595 s / it)
[13:01:59.333925] * Acc@1 82.981 Acc@5 96.643 loss 0.652
[13:01:59.334231] Accuracy of the network on the 50000 test images: 83.0%
[13:01:59.334263] Max accuracy: 82.98%
[13:01:59.437402] log_dir: ./output_dir_qkformer
[13:02:10.598566] Epoch: [157]  [   0/6672]  eta: 20:36:08  lr: 0.000156  loss: 2.0158 (2.0158)  time: 11.1164  data: 2.0238  max mem: 30335
[13:27:31.179206] Epoch: [157]  [2000/6672]  eta: 0:59:35  lr: 0.000154  loss: 1.9665 (2.0725)  time: 0.7981  data: 0.0002  max mem: 30335
[13:53:03.073559] Epoch: [157]  [4000/6672]  eta: 0:34:05  lr: 0.000152  loss: 2.1851 (2.0842)  time: 0.8184  data: 0.0003  max mem: 30335
[14:18:26.961900] Epoch: [157]  [6000/6672]  eta: 0:08:33  lr: 0.000150  loss: 2.0924 (2.0828)  time: 0.7298  data: 0.0002  max mem: 30335
[14:27:09.289137] Epoch: [157]  [6671/6672]  eta: 0:00:00  lr: 0.000150  loss: 2.1178 (2.0856)  time: 0.7317  data: 0.0010  max mem: 30335
[14:27:10.020874] Epoch: [157] Total time: 1:25:10 (0.7660 s / it)
[14:27:10.181711] Averaged stats: lr: 0.000150  loss: 2.1178 (2.0927)
[14:27:14.860600] Test:  [   0/2084]  eta: 2:42:21  loss: 0.4805 (0.4805)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.6743  data: 3.7537  max mem: 30335
[14:29:24.725163] Test:  [ 500/2084]  eta: 0:07:05  loss: 0.6179 (0.4914)  acc1: 83.3333 (87.0842)  acc5: 100.0000 (98.1703)  time: 0.2574  data: 0.0002  max mem: 30335
[14:31:33.830198] Test:  [1000/2084]  eta: 0:04:45  loss: 0.4745 (0.5436)  acc1: 91.6667 (85.6102)  acc5: 95.8333 (97.8688)  time: 0.2666  data: 0.0002  max mem: 30335
[14:33:43.765847] Test:  [1500/2084]  eta: 0:02:33  loss: 0.5147 (0.6146)  acc1: 83.3333 (83.9718)  acc5: 95.8333 (97.1824)  time: 0.2946  data: 0.0002  max mem: 30335
[14:35:52.151590] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2174 (0.6514)  acc1: 95.8333 (83.0855)  acc5: 100.0000 (96.7537)  time: 0.2568  data: 0.0002  max mem: 30335
[14:36:13.309247] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3430 (0.6558)  acc1: 87.5000 (82.9660)  acc5: 100.0000 (96.7380)  time: 0.2487  data: 0.0001  max mem: 30335
[14:36:13.421466] Test: Total time: 0:09:03 (0.2607 s / it)
[14:36:28.905931] * Acc@1 82.962 Acc@5 96.722 loss 0.656
[14:36:28.906276] Accuracy of the network on the 50000 test images: 83.0%
[14:36:28.906325] Max accuracy: 82.98%
[14:36:29.137792] log_dir: ./output_dir_qkformer
[14:36:34.079992] Epoch: [158]  [   0/6672]  eta: 8:58:21  lr: 0.000150  loss: 2.0897 (2.0897)  time: 4.8413  data: 3.1426  max mem: 30335
[15:01:59.743328] Epoch: [158]  [2000/6672]  eta: 0:59:32  lr: 0.000148  loss: 2.1077 (2.0787)  time: 0.8085  data: 0.0003  max mem: 30335
[15:27:48.939449] Epoch: [158]  [4000/6672]  eta: 0:34:16  lr: 0.000146  loss: 2.1541 (2.0842)  time: 0.7454  data: 0.0002  max mem: 30335
[15:52:57.052023] Epoch: [158]  [6000/6672]  eta: 0:08:33  lr: 0.000144  loss: 1.9307 (2.0865)  time: 0.7272  data: 0.0002  max mem: 30335
[16:01:23.043608] Epoch: [158]  [6671/6672]  eta: 0:00:00  lr: 0.000143  loss: 2.0145 (2.0858)  time: 0.7234  data: 0.0006  max mem: 30335
[16:01:23.882237] Epoch: [158] Total time: 1:24:54 (0.7636 s / it)
[16:01:23.917587] Averaged stats: lr: 0.000143  loss: 2.0145 (2.0850)
[16:01:28.153390] Test:  [   0/2084]  eta: 2:26:55  loss: 0.3490 (0.3490)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.2302  data: 3.7007  max mem: 30335
[16:03:36.581714] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.5740 (0.4890)  acc1: 83.3333 (87.6497)  acc5: 100.0000 (98.1454)  time: 0.2560  data: 0.0002  max mem: 30335
[16:05:47.646109] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5409 (0.5465)  acc1: 87.5000 (85.8600)  acc5: 95.8333 (97.7522)  time: 0.2569  data: 0.0002  max mem: 30335
[16:07:56.276733] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4418 (0.6181)  acc1: 87.5000 (84.0939)  acc5: 95.8333 (97.0298)  time: 0.2564  data: 0.0002  max mem: 30335
[16:10:06.852724] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2219 (0.6536)  acc1: 95.8333 (83.0960)  acc5: 100.0000 (96.6683)  time: 0.2575  data: 0.0002  max mem: 30335
[16:10:28.040423] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3978 (0.6585)  acc1: 91.6667 (82.9740)  acc5: 100.0000 (96.6560)  time: 0.2489  data: 0.0002  max mem: 30335
[16:10:28.157198] Test: Total time: 0:09:04 (0.2611 s / it)
[16:10:43.736275] * Acc@1 82.959 Acc@5 96.635 loss 0.659
[16:10:43.736626] Accuracy of the network on the 50000 test images: 83.0%
[16:10:43.736675] Max accuracy: 82.98%
[16:10:43.956622] log_dir: ./output_dir_qkformer
[16:10:49.933957] Epoch: [159]  [   0/6672]  eta: 11:04:22  lr: 0.000143  loss: 2.0179 (2.0179)  time: 5.9747  data: 3.0807  max mem: 30335
[16:36:25.995150] Epoch: [159]  [2000/6672]  eta: 0:59:59  lr: 0.000141  loss: 2.1787 (2.0731)  time: 0.8888  data: 0.0002  max mem: 30335
[17:02:19.324590] Epoch: [159]  [4000/6672]  eta: 0:34:26  lr: 0.000139  loss: 2.0989 (2.0733)  time: 0.7247  data: 0.0002  max mem: 30335
[17:27:24.267686] Epoch: [159]  [6000/6672]  eta: 0:08:35  lr: 0.000137  loss: 2.0762 (2.0761)  time: 0.7433  data: 0.0003  max mem: 30335
[17:35:59.739092] Epoch: [159]  [6671/6672]  eta: 0:00:00  lr: 0.000136  loss: 2.1889 (2.0785)  time: 0.7244  data: 0.0007  max mem: 30335
[17:36:00.597740] Epoch: [159] Total time: 1:25:16 (0.7669 s / it)
[17:36:00.645816] Averaged stats: lr: 0.000136  loss: 2.1889 (2.0761)
[17:36:06.400317] Test:  [   0/2084]  eta: 3:19:41  loss: 0.2940 (0.2940)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.7494  data: 5.0715  max mem: 30335
[17:38:15.429401] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.5994 (0.4835)  acc1: 79.1667 (87.4002)  acc5: 100.0000 (98.2202)  time: 0.2572  data: 0.0002  max mem: 30335
[17:40:24.436615] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5193 (0.5333)  acc1: 87.5000 (85.8641)  acc5: 95.8333 (97.8771)  time: 0.2572  data: 0.0002  max mem: 30335
[17:42:32.814822] Test:  [1500/2084]  eta: 0:02:32  loss: 0.3958 (0.6096)  acc1: 87.5000 (83.9635)  acc5: 95.8333 (97.1214)  time: 0.2572  data: 0.0002  max mem: 30335
[17:44:42.042093] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2066 (0.6436)  acc1: 95.8333 (83.1314)  acc5: 100.0000 (96.7808)  time: 0.2965  data: 0.0002  max mem: 30335
[17:45:03.334839] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4014 (0.6478)  acc1: 91.6667 (83.0120)  acc5: 100.0000 (96.7740)  time: 0.2573  data: 0.0002  max mem: 30335
[17:45:03.460684] Test: Total time: 0:09:02 (0.2605 s / it)
[17:45:18.940756] * Acc@1 83.022 Acc@5 96.779 loss 0.648
[17:45:18.941364] Accuracy of the network on the 50000 test images: 83.0%
[17:45:18.941406] Max accuracy: 83.02%
[17:45:19.363439] log_dir: ./output_dir_qkformer
[17:45:26.532046] Epoch: [160]  [   0/6672]  eta: 13:13:52  lr: 0.000136  loss: 1.8462 (1.8462)  time: 7.1392  data: 2.4439  max mem: 30335
[18:10:46.717146] Epoch: [160]  [2000/6672]  eta: 0:59:25  lr: 0.000134  loss: 1.9646 (2.0451)  time: 0.7264  data: 0.0002  max mem: 30335
[18:36:17.687165] Epoch: [160]  [4000/6672]  eta: 0:34:01  lr: 0.000132  loss: 1.8722 (2.0550)  time: 0.7251  data: 0.0003  max mem: 30335
[19:01:40.283445] Epoch: [160]  [6000/6672]  eta: 0:08:32  lr: 0.000130  loss: 2.0580 (2.0591)  time: 0.7399  data: 0.0002  max mem: 30335
[19:10:13.401828] Epoch: [160]  [6671/6672]  eta: 0:00:00  lr: 0.000130  loss: 2.1974 (2.0618)  time: 0.7259  data: 0.0011  max mem: 30335
[19:10:14.196688] Epoch: [160] Total time: 1:24:54 (0.7636 s / it)
[19:10:14.264778] Averaged stats: lr: 0.000130  loss: 2.1974 (2.0654)
[19:10:19.198421] Test:  [   0/2084]  eta: 2:51:07  loss: 0.4901 (0.4901)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.9266  data: 4.1185  max mem: 30335
[19:12:28.359162] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.6037 (0.4873)  acc1: 83.3333 (87.4252)  acc5: 100.0000 (98.2285)  time: 0.2631  data: 0.0002  max mem: 30335
[19:14:37.951965] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5590 (0.5369)  acc1: 87.5000 (85.9183)  acc5: 95.8333 (97.9229)  time: 0.2567  data: 0.0002  max mem: 30335
[19:16:50.171087] Test:  [1500/2084]  eta: 0:02:34  loss: 0.5741 (0.6074)  acc1: 87.5000 (84.1439)  acc5: 100.0000 (97.2824)  time: 0.2568  data: 0.0002  max mem: 30335
[19:18:58.662748] Test:  [2000/2084]  eta: 0:00:22  loss: 0.2541 (0.6437)  acc1: 91.6667 (83.2105)  acc5: 100.0000 (96.8745)  time: 0.2570  data: 0.0002  max mem: 30335
[19:19:19.814694] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3359 (0.6484)  acc1: 91.6667 (83.1160)  acc5: 100.0000 (96.8660)  time: 0.2489  data: 0.0001  max mem: 30335
[19:19:19.934995] Test: Total time: 0:09:05 (0.2618 s / it)
[19:19:35.140532] * Acc@1 83.130 Acc@5 96.859 loss 0.648
[19:19:35.140827] Accuracy of the network on the 50000 test images: 83.1%
[19:19:35.140869] Max accuracy: 83.13%
[19:19:35.292638] log_dir: ./output_dir_qkformer
[19:19:41.007866] Epoch: [161]  [   0/6672]  eta: 10:35:26  lr: 0.000130  loss: 2.1816 (2.1816)  time: 5.7144  data: 2.0495  max mem: 30335
[19:45:24.816070] Epoch: [161]  [2000/6672]  eta: 1:00:16  lr: 0.000128  loss: 1.9613 (2.0398)  time: 0.7450  data: 0.0003  max mem: 30335
[20:11:05.568838] Epoch: [161]  [4000/6672]  eta: 0:34:23  lr: 0.000126  loss: 1.9788 (2.0496)  time: 0.7264  data: 0.0003  max mem: 30335
[20:36:44.846765] Epoch: [161]  [6000/6672]  eta: 0:08:38  lr: 0.000124  loss: 1.9618 (2.0525)  time: 0.7460  data: 0.0002  max mem: 30335
[20:45:09.294104] Epoch: [161]  [6671/6672]  eta: 0:00:00  lr: 0.000124  loss: 2.1159 (2.0531)  time: 0.7234  data: 0.0010  max mem: 30335
[20:45:10.086066] Epoch: [161] Total time: 1:25:34 (0.7696 s / it)
[20:45:10.162733] Averaged stats: lr: 0.000124  loss: 2.1159 (2.0554)
[20:45:14.559468] Test:  [   0/2084]  eta: 2:32:31  loss: 0.2578 (0.2578)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.3912  data: 3.7199  max mem: 30335
[20:47:23.254132] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.6248 (0.4918)  acc1: 83.3333 (87.2755)  acc5: 100.0000 (98.2285)  time: 0.2565  data: 0.0002  max mem: 30335
[20:49:31.534523] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5327 (0.5312)  acc1: 87.5000 (86.0473)  acc5: 95.8333 (97.9271)  time: 0.2559  data: 0.0002  max mem: 30335
[20:51:40.871089] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4585 (0.5975)  acc1: 83.3333 (84.4132)  acc5: 95.8333 (97.2574)  time: 0.2557  data: 0.0002  max mem: 30335
[20:53:50.723555] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2094 (0.6350)  acc1: 95.8333 (83.4937)  acc5: 100.0000 (96.8474)  time: 0.2569  data: 0.0002  max mem: 30335
[20:54:11.939513] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3581 (0.6400)  acc1: 91.6667 (83.3500)  acc5: 100.0000 (96.8300)  time: 0.2496  data: 0.0002  max mem: 30335
[20:54:12.073129] Test: Total time: 0:09:01 (0.2600 s / it)
[20:54:27.535224] * Acc@1 83.352 Acc@5 96.833 loss 0.640
[20:54:27.535820] Accuracy of the network on the 50000 test images: 83.4%
[20:54:27.535907] Max accuracy: 83.35%
[20:54:27.670207] log_dir: ./output_dir_qkformer
[20:54:33.218031] Epoch: [162]  [   0/6672]  eta: 10:14:41  lr: 0.000124  loss: 1.8823 (1.8823)  time: 5.5278  data: 2.4835  max mem: 30335
[21:20:06.388505] Epoch: [162]  [2000/6672]  eta: 0:59:51  lr: 0.000122  loss: 2.0305 (2.0375)  time: 0.8054  data: 0.0003  max mem: 30335
[21:45:32.285483] Epoch: [162]  [4000/6672]  eta: 0:34:05  lr: 0.000120  loss: 1.9079 (2.0403)  time: 0.7286  data: 0.0003  max mem: 30335
[22:10:53.790010] Epoch: [162]  [6000/6672]  eta: 0:08:33  lr: 0.000118  loss: 2.1142 (2.0424)  time: 0.7493  data: 0.0003  max mem: 30335
[22:19:15.058666] Epoch: [162]  [6671/6672]  eta: 0:00:00  lr: 0.000117  loss: 1.8125 (2.0422)  time: 0.7201  data: 0.0011  max mem: 30335
[22:19:15.917895] Epoch: [162] Total time: 1:24:48 (0.7626 s / it)
[22:19:15.956967] Averaged stats: lr: 0.000117  loss: 1.8125 (2.0480)
[22:19:21.919513] Test:  [   0/2084]  eta: 3:26:53  loss: 0.3762 (0.3762)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.9565  data: 5.1042  max mem: 30335
[22:21:32.334875] Test:  [ 500/2084]  eta: 0:07:11  loss: 0.5857 (0.4813)  acc1: 87.5000 (87.5998)  acc5: 100.0000 (98.3450)  time: 0.2554  data: 0.0002  max mem: 30335
[22:23:40.984516] Test:  [1000/2084]  eta: 0:04:46  loss: 0.5327 (0.5330)  acc1: 91.6667 (85.9099)  acc5: 95.8333 (97.9729)  time: 0.2556  data: 0.0002  max mem: 30335
[22:25:49.277104] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4585 (0.6035)  acc1: 87.5000 (84.2438)  acc5: 95.8333 (97.2435)  time: 0.2556  data: 0.0002  max mem: 30335
[22:27:58.276954] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1895 (0.6392)  acc1: 95.8333 (83.4291)  acc5: 100.0000 (96.8766)  time: 0.2561  data: 0.0002  max mem: 30335
[22:28:19.405279] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4060 (0.6423)  acc1: 91.6667 (83.3620)  acc5: 100.0000 (96.8700)  time: 0.2486  data: 0.0001  max mem: 30335
[22:28:19.531165] Test: Total time: 0:09:03 (0.2608 s / it)
[22:28:34.125962] * Acc@1 83.342 Acc@5 96.866 loss 0.642
[22:28:34.126210] Accuracy of the network on the 50000 test images: 83.3%
[22:28:34.126260] Max accuracy: 83.35%
[22:28:34.667842] log_dir: ./output_dir_qkformer
[22:28:41.230454] Epoch: [163]  [   0/6672]  eta: 12:09:09  lr: 0.000117  loss: 2.6404 (2.6404)  time: 6.5572  data: 2.4684  max mem: 30335
[22:54:11.352293] Epoch: [163]  [2000/6672]  eta: 0:59:46  lr: 0.000116  loss: 2.0710 (2.0432)  time: 0.7546  data: 0.0003  max mem: 30335
[23:19:22.821502] Epoch: [163]  [4000/6672]  eta: 0:33:55  lr: 0.000114  loss: 2.0015 (2.0427)  time: 0.7285  data: 0.0007  max mem: 30335
[23:44:47.320952] Epoch: [163]  [6000/6672]  eta: 0:08:31  lr: 0.000112  loss: 1.9616 (2.0406)  time: 0.7240  data: 0.0003  max mem: 30335
[23:53:11.498033] Epoch: [163]  [6671/6672]  eta: 0:00:00  lr: 0.000111  loss: 2.0132 (2.0423)  time: 0.7213  data: 0.0011  max mem: 30335
[23:53:12.248313] Epoch: [163] Total time: 1:24:37 (0.7610 s / it)
[23:53:12.290145] Averaged stats: lr: 0.000111  loss: 2.0132 (2.0386)
[23:53:17.919697] Test:  [   0/2084]  eta: 3:15:22  loss: 0.4582 (0.4582)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.6248  data: 4.7110  max mem: 30335
[23:55:25.709416] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5496 (0.4797)  acc1: 83.3333 (87.4584)  acc5: 100.0000 (98.2868)  time: 0.2551  data: 0.0002  max mem: 30335
[23:57:34.400842] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5150 (0.5278)  acc1: 87.5000 (86.0848)  acc5: 95.8333 (97.9770)  time: 0.2554  data: 0.0002  max mem: 30335
[23:59:42.900119] Test:  [1500/2084]  eta: 0:02:31  loss: 0.3761 (0.5963)  acc1: 87.5000 (84.4382)  acc5: 100.0000 (97.2518)  time: 0.2611  data: 0.0002  max mem: 30335
[00:01:52.353984] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2037 (0.6286)  acc1: 95.8333 (83.5749)  acc5: 100.0000 (96.9140)  time: 0.2560  data: 0.0002  max mem: 30335
[00:02:13.359726] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2674 (0.6328)  acc1: 91.6667 (83.4740)  acc5: 100.0000 (96.9000)  time: 0.2463  data: 0.0002  max mem: 30335
[00:02:13.495472] Test: Total time: 0:09:01 (0.2597 s / it)
[00:02:27.186375] * Acc@1 83.474 Acc@5 96.894 loss 0.633
[00:02:27.186697] Accuracy of the network on the 50000 test images: 83.5%
[00:02:27.186764] Max accuracy: 83.47%
[00:02:28.129651] log_dir: ./output_dir_qkformer
[00:02:38.221219] Epoch: [164]  [   0/6672]  eta: 18:27:18  lr: 0.000111  loss: 2.3566 (2.3566)  time: 9.9579  data: 2.4971  max mem: 30335
[00:28:01.666174] Epoch: [164]  [2000/6672]  eta: 0:59:39  lr: 0.000110  loss: 1.8406 (2.0193)  time: 0.7310  data: 0.0003  max mem: 30335
[00:53:39.435620] Epoch: [164]  [4000/6672]  eta: 0:34:10  lr: 0.000108  loss: 1.8693 (2.0237)  time: 0.7432  data: 0.0002  max mem: 30335
[01:19:19.618294] Epoch: [164]  [6000/6672]  eta: 0:08:36  lr: 0.000106  loss: 2.0196 (2.0279)  time: 0.7218  data: 0.0002  max mem: 30335
[01:28:03.376858] Epoch: [164]  [6671/6672]  eta: 0:00:00  lr: 0.000105  loss: 1.9872 (2.0296)  time: 0.7196  data: 0.0006  max mem: 30335
[01:28:04.299398] Epoch: [164] Total time: 1:25:36 (0.7698 s / it)
[01:28:04.333010] Averaged stats: lr: 0.000105  loss: 1.9872 (2.0290)
[01:28:09.848697] Test:  [   0/2084]  eta: 3:11:22  loss: 0.3811 (0.3811)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.5099  data: 4.4433  max mem: 30335
[01:30:17.727527] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.4872 (0.4756)  acc1: 83.3333 (87.5166)  acc5: 100.0000 (98.3450)  time: 0.2568  data: 0.0002  max mem: 30335
[01:32:27.242521] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4301 (0.5233)  acc1: 87.5000 (86.0723)  acc5: 95.8333 (98.0270)  time: 0.3050  data: 0.0002  max mem: 30335
[01:34:36.495214] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4850 (0.5898)  acc1: 91.6667 (84.4520)  acc5: 100.0000 (97.2962)  time: 0.2565  data: 0.0002  max mem: 30335
[01:36:45.863032] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1974 (0.6281)  acc1: 95.8333 (83.4937)  acc5: 100.0000 (96.9078)  time: 0.2553  data: 0.0002  max mem: 30335
[01:37:06.872526] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4713 (0.6339)  acc1: 87.5000 (83.3440)  acc5: 100.0000 (96.8800)  time: 0.2460  data: 0.0001  max mem: 30335
[01:37:06.978699] Test: Total time: 0:09:02 (0.2604 s / it)
[01:37:20.859870] * Acc@1 83.361 Acc@5 96.872 loss 0.634
[01:37:20.860189] Accuracy of the network on the 50000 test images: 83.4%
[01:37:20.860228] Max accuracy: 83.47%
[01:37:21.117215] log_dir: ./output_dir_qkformer
[01:37:34.666334] Epoch: [165]  [   0/6672]  eta: 1 day, 0:48:04  lr: 0.000105  loss: 2.5224 (2.5224)  time: 13.3820  data: 3.4833  max mem: 30335
[02:03:27.749995] Epoch: [165]  [2000/6672]  eta: 1:00:56  lr: 0.000104  loss: 2.0375 (2.0145)  time: 0.7332  data: 0.0003  max mem: 30335
[02:29:13.503169] Epoch: [165]  [4000/6672]  eta: 0:34:37  lr: 0.000102  loss: 2.0001 (2.0131)  time: 0.7234  data: 0.0003  max mem: 30335
[02:55:02.190937] Epoch: [165]  [6000/6672]  eta: 0:08:41  lr: 0.000100  loss: 1.7035 (2.0153)  time: 0.7238  data: 0.0002  max mem: 30335
[03:03:53.171972] Epoch: [165]  [6671/6672]  eta: 0:00:00  lr: 0.000100  loss: 2.0164 (2.0169)  time: 0.7237  data: 0.0012  max mem: 30335
[03:03:53.849098] Epoch: [165] Total time: 1:26:32 (0.7783 s / it)
[03:03:53.900120] Averaged stats: lr: 0.000100  loss: 2.0164 (2.0169)
[03:03:58.350703] Test:  [   0/2084]  eta: 2:34:23  loss: 0.2873 (0.2873)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4449  data: 3.6689  max mem: 30335
[03:06:07.425077] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.4994 (0.4771)  acc1: 87.5000 (87.5333)  acc5: 100.0000 (98.2452)  time: 0.2557  data: 0.0002  max mem: 30335
[03:08:16.352007] Test:  [1000/2084]  eta: 0:04:44  loss: 0.3921 (0.5214)  acc1: 91.6667 (86.2471)  acc5: 95.8333 (97.9770)  time: 0.2552  data: 0.0002  max mem: 30335
[03:10:27.009443] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4610 (0.5860)  acc1: 87.5000 (84.6408)  acc5: 95.8333 (97.2851)  time: 0.2560  data: 0.0002  max mem: 30335
[03:12:37.667363] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1724 (0.6212)  acc1: 95.8333 (83.7540)  acc5: 100.0000 (96.9536)  time: 0.2559  data: 0.0002  max mem: 30335
[03:12:58.729421] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2987 (0.6276)  acc1: 91.6667 (83.6020)  acc5: 100.0000 (96.9220)  time: 0.2484  data: 0.0001  max mem: 30335
[03:12:58.849373] Test: Total time: 0:09:04 (0.2615 s / it)
[03:13:12.882131] * Acc@1 83.589 Acc@5 96.924 loss 0.628
[03:13:12.882678] Accuracy of the network on the 50000 test images: 83.6%
[03:13:12.882737] Max accuracy: 83.59%
[03:13:13.442465] log_dir: ./output_dir_qkformer
[03:13:22.623896] Epoch: [166]  [   0/6672]  eta: 16:49:41  lr: 0.000100  loss: 2.2235 (2.2235)  time: 9.0800  data: 3.8047  max mem: 30335
[03:39:16.347435] Epoch: [166]  [2000/6672]  eta: 1:00:48  lr: 0.000098  loss: 2.0055 (1.9979)  time: 0.7234  data: 0.0002  max mem: 30335
[04:05:12.063726] Epoch: [166]  [4000/6672]  eta: 0:34:41  lr: 0.000096  loss: 2.1513 (2.0047)  time: 0.7245  data: 0.0003  max mem: 30335
[04:31:25.755715] Epoch: [166]  [6000/6672]  eta: 0:08:45  lr: 0.000095  loss: 1.9849 (2.0037)  time: 0.8180  data: 0.0003  max mem: 30335
[04:40:00.140031] Epoch: [166]  [6671/6672]  eta: 0:00:00  lr: 0.000094  loss: 2.0296 (2.0035)  time: 0.7180  data: 0.0006  max mem: 30335
[04:40:00.915046] Epoch: [166] Total time: 1:26:47 (0.7805 s / it)
[04:40:00.954733] Averaged stats: lr: 0.000094  loss: 2.0296 (2.0098)
[04:40:06.308254] Test:  [   0/2084]  eta: 3:03:46  loss: 0.3369 (0.3369)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.2911  data: 4.1394  max mem: 30335
[04:42:15.954242] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.6203 (0.4744)  acc1: 83.3333 (87.5582)  acc5: 100.0000 (98.4198)  time: 0.2555  data: 0.0002  max mem: 30335
[04:44:26.461503] Test:  [1000/2084]  eta: 0:04:47  loss: 0.5115 (0.5226)  acc1: 87.5000 (86.0723)  acc5: 95.8333 (98.0186)  time: 0.2559  data: 0.0002  max mem: 30335
[04:46:34.746450] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4919 (0.5900)  acc1: 87.5000 (84.5159)  acc5: 95.8333 (97.2824)  time: 0.2561  data: 0.0003  max mem: 30335
[04:48:44.177337] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2004 (0.6235)  acc1: 95.8333 (83.6498)  acc5: 100.0000 (96.9578)  time: 0.2553  data: 0.0002  max mem: 30335
[04:49:05.277253] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2951 (0.6282)  acc1: 91.6667 (83.5500)  acc5: 100.0000 (96.9360)  time: 0.2482  data: 0.0001  max mem: 30335
[04:49:05.420392] Test: Total time: 0:09:04 (0.2613 s / it)
[04:49:20.194163] * Acc@1 83.538 Acc@5 96.930 loss 0.629
[04:49:20.194613] Accuracy of the network on the 50000 test images: 83.5%
[04:49:20.194684] Max accuracy: 83.59%
[04:49:20.562516] log_dir: ./output_dir_qkformer
[04:49:37.104801] Epoch: [167]  [   0/6672]  eta: 1 day, 6:29:48  lr: 0.000094  loss: 2.1753 (2.1753)  time: 16.4551  data: 5.1188  max mem: 30335
[05:15:17.591500] Epoch: [167]  [2000/6672]  eta: 1:00:34  lr: 0.000092  loss: 1.9701 (1.9787)  time: 0.7243  data: 0.0002  max mem: 30335
[05:41:17.688029] Epoch: [167]  [4000/6672]  eta: 0:34:41  lr: 0.000091  loss: 2.0106 (1.9843)  time: 0.7241  data: 0.0003  max mem: 30335
[06:07:09.477605] Epoch: [167]  [6000/6672]  eta: 0:08:42  lr: 0.000089  loss: 1.9474 (1.9924)  time: 0.7216  data: 0.0002  max mem: 30335
[06:15:42.235684] Epoch: [167]  [6671/6672]  eta: 0:00:00  lr: 0.000089  loss: 2.1036 (1.9944)  time: 0.7190  data: 0.0006  max mem: 30335
[06:15:42.982401] Epoch: [167] Total time: 1:26:22 (0.7767 s / it)
[06:15:43.113167] Averaged stats: lr: 0.000089  loss: 2.1036 (1.9993)
[06:15:48.842937] Test:  [   0/2084]  eta: 3:18:50  loss: 0.3987 (0.3987)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 5.7247  data: 4.9087  max mem: 30335
[06:17:57.892704] Test:  [ 500/2084]  eta: 0:07:06  loss: 0.4968 (0.4763)  acc1: 87.5000 (87.7162)  acc5: 100.0000 (98.4032)  time: 0.2556  data: 0.0002  max mem: 30335
[06:20:06.755540] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5748 (0.5288)  acc1: 83.3333 (86.1514)  acc5: 95.8333 (97.9437)  time: 0.2555  data: 0.0002  max mem: 30335
[06:22:16.534307] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4053 (0.5941)  acc1: 87.5000 (84.6686)  acc5: 95.8333 (97.2602)  time: 0.2553  data: 0.0002  max mem: 30335
[06:24:25.952561] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2029 (0.6294)  acc1: 95.8333 (83.7560)  acc5: 100.0000 (96.9265)  time: 0.2566  data: 0.0002  max mem: 30335
[06:24:46.951299] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3384 (0.6343)  acc1: 91.6667 (83.6280)  acc5: 100.0000 (96.9060)  time: 0.2471  data: 0.0002  max mem: 30335
[06:24:47.090979] Test: Total time: 0:09:03 (0.2610 s / it)
[06:25:01.142554] * Acc@1 83.631 Acc@5 96.898 loss 0.634
[06:25:01.143118] Accuracy of the network on the 50000 test images: 83.6%
[06:25:01.143171] Max accuracy: 83.63%
[06:25:01.354536] log_dir: ./output_dir_qkformer
[06:25:14.503952] Epoch: [168]  [   0/6672]  eta: 1 day, 0:19:42  lr: 0.000089  loss: 1.8081 (1.8081)  time: 13.1269  data: 3.2738  max mem: 30335
[06:51:00.301186] Epoch: [168]  [2000/6672]  eta: 1:00:38  lr: 0.000087  loss: 1.8188 (1.9952)  time: 0.7227  data: 0.0003  max mem: 30335
[07:17:06.500080] Epoch: [168]  [4000/6672]  eta: 0:34:46  lr: 0.000085  loss: 1.8222 (1.9977)  time: 0.7208  data: 0.0003  max mem: 30335
[07:43:01.271245] Epoch: [168]  [6000/6672]  eta: 0:08:43  lr: 0.000084  loss: 1.8453 (1.9935)  time: 0.7225  data: 0.0003  max mem: 30335
[07:51:32.160121] Epoch: [168]  [6671/6672]  eta: 0:00:00  lr: 0.000083  loss: 1.9594 (1.9916)  time: 0.7184  data: 0.0006  max mem: 30335
[07:51:32.920861] Epoch: [168] Total time: 1:26:31 (0.7781 s / it)
[07:51:32.992187] Averaged stats: lr: 0.000083  loss: 1.9594 (1.9922)
[07:51:37.851183] Test:  [   0/2084]  eta: 2:48:28  loss: 0.4774 (0.4774)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.8505  data: 3.9777  max mem: 30335
[07:53:47.641408] Test:  [ 500/2084]  eta: 0:07:05  loss: 0.5335 (0.4695)  acc1: 83.3333 (88.0240)  acc5: 100.0000 (98.3616)  time: 0.2563  data: 0.0002  max mem: 30335
[07:55:56.691745] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5048 (0.5132)  acc1: 87.5000 (86.5634)  acc5: 95.8333 (98.0936)  time: 0.2896  data: 0.0002  max mem: 30335
[07:58:08.191870] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4353 (0.5900)  acc1: 87.5000 (84.8518)  acc5: 100.0000 (97.3101)  time: 0.2556  data: 0.0002  max mem: 30335
[08:00:18.598086] Test:  [2000/2084]  eta: 0:00:22  loss: 0.2591 (0.6273)  acc1: 95.8333 (83.7685)  acc5: 100.0000 (96.9307)  time: 0.2560  data: 0.0002  max mem: 30335
[08:00:39.660223] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4605 (0.6325)  acc1: 91.6667 (83.6440)  acc5: 100.0000 (96.9060)  time: 0.2483  data: 0.0002  max mem: 30335
[08:00:39.806064] Test: Total time: 0:09:06 (0.2624 s / it)
[08:00:53.573609] * Acc@1 83.641 Acc@5 96.918 loss 0.632
[08:00:53.573944] Accuracy of the network on the 50000 test images: 83.6%
[08:00:53.573984] Max accuracy: 83.64%
[08:00:54.209390] log_dir: ./output_dir_qkformer
[08:01:04.171139] Epoch: [169]  [   0/6672]  eta: 18:27:36  lr: 0.000083  loss: 2.6299 (2.6299)  time: 9.9606  data: 3.7520  max mem: 30335
[08:26:25.301160] Epoch: [169]  [2000/6672]  eta: 0:59:33  lr: 0.000082  loss: 1.9142 (1.9768)  time: 0.7216  data: 0.0003  max mem: 30335
[08:51:56.331726] Epoch: [169]  [4000/6672]  eta: 0:34:04  lr: 0.000080  loss: 1.8389 (1.9890)  time: 0.7199  data: 0.0003  max mem: 30335
[09:17:36.125328] Epoch: [169]  [6000/6672]  eta: 0:08:35  lr: 0.000079  loss: 2.0211 (1.9926)  time: 0.7237  data: 0.0003  max mem: 30335
[09:25:58.267624] Epoch: [169]  [6671/6672]  eta: 0:00:00  lr: 0.000078  loss: 1.8168 (1.9900)  time: 0.7192  data: 0.0012  max mem: 30335
[09:25:59.075137] Epoch: [169] Total time: 1:25:04 (0.7651 s / it)
[09:25:59.111589] Averaged stats: lr: 0.000078  loss: 1.8168 (1.9840)
[09:26:03.648914] Test:  [   0/2084]  eta: 2:37:21  loss: 0.3431 (0.3431)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.5305  data: 3.7014  max mem: 30335
[09:28:12.036777] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.4603 (0.4653)  acc1: 83.3333 (88.0073)  acc5: 100.0000 (98.3367)  time: 0.2670  data: 0.0015  max mem: 30335
[09:30:21.444705] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4956 (0.5139)  acc1: 91.6667 (86.4094)  acc5: 95.8333 (98.0894)  time: 0.2555  data: 0.0002  max mem: 30335
[09:32:29.498052] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4348 (0.5900)  acc1: 87.5000 (84.6380)  acc5: 95.8333 (97.3129)  time: 0.2557  data: 0.0002  max mem: 30335
[09:34:39.159700] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2092 (0.6243)  acc1: 95.8333 (83.7644)  acc5: 100.0000 (96.9682)  time: 0.2562  data: 0.0002  max mem: 30335
[09:35:00.268570] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4823 (0.6296)  acc1: 87.5000 (83.6320)  acc5: 100.0000 (96.9500)  time: 0.2482  data: 0.0002  max mem: 30335
[09:35:00.379549] Test: Total time: 0:09:01 (0.2597 s / it)
[09:35:14.696857] * Acc@1 83.626 Acc@5 96.939 loss 0.630
[09:35:14.697431] Accuracy of the network on the 50000 test images: 83.6%
[09:35:14.697480] Max accuracy: 83.64%
[09:35:15.240053] log_dir: ./output_dir_qkformer
[09:35:24.862603] Epoch: [170]  [   0/6672]  eta: 17:48:54  lr: 0.000078  loss: 1.9503 (1.9503)  time: 9.6124  data: 2.8721  max mem: 30335
[10:00:43.702485] Epoch: [170]  [2000/6672]  eta: 0:59:28  lr: 0.000077  loss: 1.9146 (1.9737)  time: 0.7218  data: 0.0002  max mem: 30335
[10:26:08.464152] Epoch: [170]  [4000/6672]  eta: 0:33:58  lr: 0.000075  loss: 1.9641 (1.9757)  time: 0.7544  data: 0.0003  max mem: 30335
[10:51:30.563287] Epoch: [170]  [6000/6672]  eta: 0:08:32  lr: 0.000074  loss: 1.9963 (1.9787)  time: 0.7221  data: 0.0003  max mem: 30335
[10:59:57.390660] Epoch: [170]  [6671/6672]  eta: 0:00:00  lr: 0.000073  loss: 1.9478 (1.9791)  time: 0.7205  data: 0.0006  max mem: 30335
[10:59:58.223828] Epoch: [170] Total time: 1:24:42 (0.7618 s / it)
[10:59:58.263719] Averaged stats: lr: 0.000073  loss: 1.9478 (1.9725)
[11:00:03.446626] Test:  [   0/2084]  eta: 2:59:51  loss: 0.3491 (0.3491)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.1781  data: 4.3949  max mem: 30335
[11:02:14.003605] Test:  [ 500/2084]  eta: 0:07:09  loss: 0.5585 (0.4748)  acc1: 83.3333 (87.8410)  acc5: 100.0000 (98.3034)  time: 0.2561  data: 0.0002  max mem: 30335
[11:04:22.247585] Test:  [1000/2084]  eta: 0:04:45  loss: 0.6047 (0.5252)  acc1: 91.6667 (86.2180)  acc5: 95.8333 (98.0436)  time: 0.2560  data: 0.0002  max mem: 30335
[11:06:32.501937] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4797 (0.5954)  acc1: 87.5000 (84.6352)  acc5: 100.0000 (97.2990)  time: 0.2552  data: 0.0002  max mem: 30335
[11:08:41.171901] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2251 (0.6312)  acc1: 95.8333 (83.7748)  acc5: 100.0000 (96.9411)  time: 0.2551  data: 0.0002  max mem: 30335
[11:09:02.209732] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4226 (0.6350)  acc1: 91.6667 (83.6500)  acc5: 100.0000 (96.9300)  time: 0.2477  data: 0.0001  max mem: 30335
[11:09:02.319454] Test: Total time: 0:09:04 (0.2611 s / it)
[11:09:16.797674] * Acc@1 83.612 Acc@5 96.932 loss 0.635
[11:09:16.798038] Accuracy of the network on the 50000 test images: 83.6%
[11:09:16.798075] Max accuracy: 83.64%
[11:09:17.116432] log_dir: ./output_dir_qkformer
[11:09:30.492511] Epoch: [171]  [   0/6672]  eta: 1 day, 0:33:01  lr: 0.000073  loss: 2.0350 (2.0350)  time: 13.2466  data: 2.6280  max mem: 30335
[11:34:59.977678] Epoch: [171]  [2000/6672]  eta: 1:00:01  lr: 0.000072  loss: 1.8682 (1.9530)  time: 0.7279  data: 0.0002  max mem: 30335
[12:00:27.885650] Epoch: [171]  [4000/6672]  eta: 0:34:10  lr: 0.000070  loss: 1.8170 (1.9620)  time: 0.7253  data: 0.0002  max mem: 30335
[12:26:16.092627] Epoch: [171]  [6000/6672]  eta: 0:08:37  lr: 0.000069  loss: 1.8731 (1.9674)  time: 0.7237  data: 0.0003  max mem: 30335
[12:34:48.125306] Epoch: [171]  [6671/6672]  eta: 0:00:00  lr: 0.000068  loss: 1.7892 (1.9656)  time: 0.7178  data: 0.0009  max mem: 30335
[12:34:49.002232] Epoch: [171] Total time: 1:25:31 (0.7692 s / it)
[12:34:49.110046] Averaged stats: lr: 0.000068  loss: 1.7892 (1.9654)
[12:34:55.023996] Test:  [   0/2084]  eta: 3:25:14  loss: 0.3490 (0.3490)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.9093  data: 4.9954  max mem: 30335
[12:37:02.811101] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.5530 (0.4676)  acc1: 83.3333 (88.0572)  acc5: 100.0000 (98.3782)  time: 0.2597  data: 0.0002  max mem: 30335
[12:39:10.728505] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4764 (0.5145)  acc1: 87.5000 (86.7133)  acc5: 95.8333 (98.0603)  time: 0.2554  data: 0.0002  max mem: 30335
[12:41:20.113082] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4101 (0.5869)  acc1: 91.6667 (84.8573)  acc5: 95.8333 (97.3656)  time: 0.2837  data: 0.0002  max mem: 30335
[12:43:27.985984] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2596 (0.6226)  acc1: 95.8333 (83.9976)  acc5: 100.0000 (97.0140)  time: 0.2552  data: 0.0002  max mem: 30335
[12:43:49.424385] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3713 (0.6260)  acc1: 91.6667 (83.9120)  acc5: 100.0000 (97.0200)  time: 0.2473  data: 0.0002  max mem: 30335
[12:43:49.557092] Test: Total time: 0:09:00 (0.2593 s / it)
[12:44:03.944285] * Acc@1 83.876 Acc@5 97.007 loss 0.626
[12:44:03.944834] Accuracy of the network on the 50000 test images: 83.9%
[12:44:03.944881] Max accuracy: 83.88%
[12:44:04.371336] log_dir: ./output_dir_qkformer
[12:44:11.192825] Epoch: [172]  [   0/6672]  eta: 12:36:17  lr: 0.000068  loss: 1.7867 (1.7867)  time: 6.8011  data: 3.1479  max mem: 30335
[13:09:52.675625] Epoch: [172]  [2000/6672]  eta: 1:00:14  lr: 0.000067  loss: 1.9496 (1.9594)  time: 0.7225  data: 0.0002  max mem: 30335
[13:35:17.314329] Epoch: [172]  [4000/6672]  eta: 0:34:11  lr: 0.000066  loss: 2.0569 (1.9574)  time: 0.7369  data: 0.0003  max mem: 30335
[14:01:38.308042] Epoch: [172]  [6000/6672]  eta: 0:08:40  lr: 0.000064  loss: 1.9812 (1.9577)  time: 0.7418  data: 0.0003  max mem: 30335
[14:10:22.365114] Epoch: [172]  [6671/6672]  eta: 0:00:00  lr: 0.000064  loss: 2.1077 (1.9581)  time: 0.7179  data: 0.0010  max mem: 30335
[14:10:23.209421] Epoch: [172] Total time: 1:26:18 (0.7762 s / it)
[14:10:23.244380] Averaged stats: lr: 0.000064  loss: 2.1077 (1.9567)
[14:10:27.896975] Test:  [   0/2084]  eta: 2:41:23  loss: 0.4019 (0.4019)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6464  data: 3.9254  max mem: 30335
[14:12:36.464734] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5353 (0.4644)  acc1: 87.5000 (87.9990)  acc5: 100.0000 (98.3616)  time: 0.2558  data: 0.0002  max mem: 30335
[14:14:44.964572] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4549 (0.5166)  acc1: 87.5000 (86.4427)  acc5: 95.8333 (98.0353)  time: 0.2555  data: 0.0002  max mem: 30335
[14:16:52.856126] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5010 (0.5870)  acc1: 83.3333 (84.6880)  acc5: 95.8333 (97.3240)  time: 0.2558  data: 0.0002  max mem: 30335
[14:19:04.502751] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2185 (0.6232)  acc1: 95.8333 (83.7352)  acc5: 100.0000 (96.9536)  time: 0.2551  data: 0.0002  max mem: 30335
[14:19:25.548524] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4534 (0.6280)  acc1: 91.6667 (83.6220)  acc5: 100.0000 (96.9440)  time: 0.2476  data: 0.0002  max mem: 30335
[14:19:25.651255] Test: Total time: 0:09:02 (0.2603 s / it)
[14:19:39.852846] * Acc@1 83.616 Acc@5 96.940 loss 0.628
[14:19:39.853143] Accuracy of the network on the 50000 test images: 83.6%
[14:19:39.853179] Max accuracy: 83.88%
[14:19:40.476697] log_dir: ./output_dir_qkformer
[14:19:52.254967] Epoch: [173]  [   0/6672]  eta: 21:48:24  lr: 0.000064  loss: 2.0186 (2.0186)  time: 11.7662  data: 3.3073  max mem: 30335
[14:45:31.225363] Epoch: [173]  [2000/6672]  eta: 1:00:19  lr: 0.000062  loss: 1.9147 (1.9458)  time: 0.7498  data: 0.0003  max mem: 30335
[15:11:09.701364] Epoch: [173]  [4000/6672]  eta: 0:34:22  lr: 0.000061  loss: 1.7952 (1.9480)  time: 0.7212  data: 0.0003  max mem: 30335
[15:36:58.223273] Epoch: [173]  [6000/6672]  eta: 0:08:39  lr: 0.000060  loss: 1.8385 (1.9514)  time: 0.7221  data: 0.0002  max mem: 30335
[15:45:45.145909] Epoch: [173]  [6671/6672]  eta: 0:00:00  lr: 0.000059  loss: 2.0412 (1.9532)  time: 0.7179  data: 0.0009  max mem: 30335
[15:45:45.981615] Epoch: [173] Total time: 1:26:05 (0.7742 s / it)
[15:45:46.053311] Averaged stats: lr: 0.000059  loss: 2.0412 (1.9492)
[15:45:51.429223] Test:  [   0/2084]  eta: 3:06:33  loss: 0.2937 (0.2937)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.3711  data: 4.5116  max mem: 30335
[15:48:00.304137] Test:  [ 500/2084]  eta: 0:07:04  loss: 0.5681 (0.4727)  acc1: 83.3333 (87.7911)  acc5: 100.0000 (98.2701)  time: 0.2556  data: 0.0002  max mem: 30335
[15:50:09.152990] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4025 (0.5220)  acc1: 87.5000 (86.3803)  acc5: 95.8333 (97.9562)  time: 0.2659  data: 0.0003  max mem: 30335
[15:52:18.198062] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4773 (0.5882)  acc1: 87.5000 (84.6686)  acc5: 95.8333 (97.2935)  time: 0.2553  data: 0.0002  max mem: 30335
[15:54:26.837265] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1906 (0.6250)  acc1: 95.8333 (83.7789)  acc5: 100.0000 (96.9494)  time: 0.2554  data: 0.0002  max mem: 30335
[15:54:47.895513] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3150 (0.6295)  acc1: 91.6667 (83.6660)  acc5: 100.0000 (96.9380)  time: 0.2479  data: 0.0002  max mem: 30335
[15:54:48.041782] Test: Total time: 0:09:01 (0.2601 s / it)
[15:55:02.787588] * Acc@1 83.692 Acc@5 96.945 loss 0.630
[15:55:02.787863] Accuracy of the network on the 50000 test images: 83.7%
[15:55:02.787913] Max accuracy: 83.88%
[15:55:03.527381] log_dir: ./output_dir_qkformer
[15:55:15.626316] Epoch: [174]  [   0/6672]  eta: 22:20:12  lr: 0.000059  loss: 2.1007 (2.1007)  time: 12.0522  data: 2.4710  max mem: 30335
[16:20:55.282541] Epoch: [174]  [2000/6672]  eta: 1:00:22  lr: 0.000058  loss: 1.9197 (1.9282)  time: 0.7216  data: 0.0002  max mem: 30335
[16:46:25.058385] Epoch: [174]  [4000/6672]  eta: 0:34:17  lr: 0.000057  loss: 1.8517 (1.9365)  time: 0.7201  data: 0.0002  max mem: 30335
[17:11:54.640281] Epoch: [174]  [6000/6672]  eta: 0:08:36  lr: 0.000055  loss: 1.9423 (1.9405)  time: 0.7223  data: 0.0003  max mem: 30335
[17:20:17.883092] Epoch: [174]  [6671/6672]  eta: 0:00:00  lr: 0.000055  loss: 1.7425 (1.9420)  time: 0.7203  data: 0.0006  max mem: 30335
[17:20:18.808092] Epoch: [174] Total time: 1:25:15 (0.7667 s / it)
[17:20:18.875899] Averaged stats: lr: 0.000055  loss: 1.7425 (1.9417)
[17:20:23.916452] Test:  [   0/2084]  eta: 2:54:52  loss: 0.3392 (0.3392)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.0348  data: 4.4112  max mem: 30335
[17:22:32.254214] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5760 (0.4600)  acc1: 83.3333 (87.9907)  acc5: 100.0000 (98.3616)  time: 0.2553  data: 0.0002  max mem: 30335
[17:24:40.577450] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4003 (0.5103)  acc1: 87.5000 (86.6384)  acc5: 95.8333 (98.0728)  time: 0.2552  data: 0.0002  max mem: 30335
[17:26:48.686535] Test:  [1500/2084]  eta: 0:02:31  loss: 0.5111 (0.5834)  acc1: 83.3333 (84.9017)  acc5: 95.8333 (97.3823)  time: 0.2554  data: 0.0004  max mem: 30335
[17:28:56.769461] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2486 (0.6201)  acc1: 95.8333 (83.9643)  acc5: 100.0000 (97.0098)  time: 0.2557  data: 0.0002  max mem: 30335
[17:29:17.801352] Test:  [2083/2084]  eta: 0:00:00  loss: 0.4177 (0.6245)  acc1: 91.6667 (83.8420)  acc5: 100.0000 (97.0120)  time: 0.2474  data: 0.0002  max mem: 30335
[17:29:17.943721] Test: Total time: 0:08:59 (0.2587 s / it)
[17:29:32.294084] * Acc@1 83.853 Acc@5 97.011 loss 0.625
[17:29:32.294577] Accuracy of the network on the 50000 test images: 83.9%
[17:29:32.294615] Max accuracy: 83.88%
[17:29:32.741171] log_dir: ./output_dir_qkformer
[17:29:41.289595] Epoch: [175]  [   0/6672]  eta: 15:50:30  lr: 0.000055  loss: 1.9811 (1.9811)  time: 8.5477  data: 2.7777  max mem: 30335
[17:55:06.738218] Epoch: [175]  [2000/6672]  eta: 0:59:40  lr: 0.000054  loss: 1.7862 (1.9227)  time: 0.7226  data: 0.0002  max mem: 30335
[18:20:35.721464] Epoch: [175]  [4000/6672]  eta: 0:34:04  lr: 0.000052  loss: 1.8757 (1.9263)  time: 0.7195  data: 0.0003  max mem: 30335
[18:46:44.942203] Epoch: [175]  [6000/6672]  eta: 0:08:38  lr: 0.000051  loss: 1.9707 (1.9285)  time: 1.0548  data: 0.0002  max mem: 30335
[18:55:24.333480] Epoch: [175]  [6671/6672]  eta: 0:00:00  lr: 0.000051  loss: 1.8292 (1.9291)  time: 0.7189  data: 0.0006  max mem: 30335
[18:55:25.252941] Epoch: [175] Total time: 1:25:52 (0.7723 s / it)
[18:55:25.303166] Averaged stats: lr: 0.000051  loss: 1.8292 (1.9324)
[18:55:30.940301] Test:  [   0/2084]  eta: 3:15:35  loss: 0.3432 (0.3432)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 5.6314  data: 4.8118  max mem: 30335
[18:57:43.085320] Test:  [ 500/2084]  eta: 0:07:15  loss: 0.4958 (0.4685)  acc1: 83.3333 (87.9990)  acc5: 100.0000 (98.3782)  time: 0.2555  data: 0.0002  max mem: 30335
[18:59:50.980948] Test:  [1000/2084]  eta: 0:04:47  loss: 0.5101 (0.5170)  acc1: 87.5000 (86.4302)  acc5: 95.8333 (98.0395)  time: 0.2552  data: 0.0002  max mem: 30335
[19:01:59.126619] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4474 (0.5849)  acc1: 87.5000 (84.7213)  acc5: 95.8333 (97.4461)  time: 0.2550  data: 0.0002  max mem: 30335
[19:04:07.715631] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1999 (0.6214)  acc1: 95.8333 (83.8622)  acc5: 100.0000 (96.9973)  time: 0.2547  data: 0.0002  max mem: 30335
[19:04:28.748019] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3397 (0.6245)  acc1: 91.6667 (83.7860)  acc5: 100.0000 (96.9820)  time: 0.2476  data: 0.0001  max mem: 30335
[19:04:28.865788] Test: Total time: 0:09:03 (0.2608 s / it)
[19:04:42.978205] * Acc@1 83.809 Acc@5 96.987 loss 0.625
[19:04:42.978444] Accuracy of the network on the 50000 test images: 83.8%
[19:04:42.978477] Max accuracy: 83.88%
[19:04:43.145502] log_dir: ./output_dir_qkformer
[19:04:58.494199] Epoch: [176]  [   0/6672]  eta: 1 day, 4:22:39  lr: 0.000051  loss: 2.5191 (2.5191)  time: 15.3116  data: 5.4517  max mem: 30335
[19:30:39.641473] Epoch: [176]  [2000/6672]  eta: 1:00:33  lr: 0.000050  loss: 1.8785 (1.9266)  time: 0.7724  data: 0.0003  max mem: 30335
[19:56:08.050003] Epoch: [176]  [4000/6672]  eta: 0:34:19  lr: 0.000048  loss: 1.8697 (1.9202)  time: 0.7670  data: 0.0002  max mem: 30335
[20:21:35.108736] Epoch: [176]  [6000/6672]  eta: 0:08:36  lr: 0.000047  loss: 1.9876 (1.9199)  time: 0.7236  data: 0.0003  max mem: 30335
[20:30:08.539980] Epoch: [176]  [6671/6672]  eta: 0:00:00  lr: 0.000047  loss: 1.7969 (1.9198)  time: 0.7219  data: 0.0010  max mem: 30335
[20:30:09.285193] Epoch: [176] Total time: 1:25:26 (0.7683 s / it)
[20:30:09.333563] Averaged stats: lr: 0.000047  loss: 1.7969 (1.9230)
[20:30:13.764114] Test:  [   0/2084]  eta: 2:33:42  loss: 0.3366 (0.3366)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4255  data: 3.7066  max mem: 30335
[20:32:22.594368] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.4854 (0.4695)  acc1: 83.3333 (88.2152)  acc5: 100.0000 (98.3283)  time: 0.2559  data: 0.0002  max mem: 30335
[20:34:31.713551] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5124 (0.5150)  acc1: 87.5000 (86.6425)  acc5: 95.8333 (98.0936)  time: 0.2562  data: 0.0002  max mem: 30335
[20:36:39.830206] Test:  [1500/2084]  eta: 0:02:31  loss: 0.3672 (0.5830)  acc1: 87.5000 (85.0544)  acc5: 100.0000 (97.4184)  time: 0.2636  data: 0.0002  max mem: 30335
[20:38:48.423748] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2085 (0.6201)  acc1: 95.8333 (84.0830)  acc5: 100.0000 (97.0307)  time: 0.2544  data: 0.0002  max mem: 30335
[20:39:09.530007] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3840 (0.6249)  acc1: 91.6667 (83.9640)  acc5: 100.0000 (97.0220)  time: 0.2477  data: 0.0001  max mem: 30335
[20:39:09.633203] Test: Total time: 0:09:00 (0.2593 s / it)
[20:39:24.149792] * Acc@1 83.981 Acc@5 97.026 loss 0.625
[20:39:24.150057] Accuracy of the network on the 50000 test images: 84.0%
[20:39:24.150099] Max accuracy: 83.98%
[20:39:24.525469] log_dir: ./output_dir_qkformer
[20:39:37.727012] Epoch: [177]  [   0/6672]  eta: 1 day, 0:13:11  lr: 0.000047  loss: 1.6134 (1.6134)  time: 13.0683  data: 2.7725  max mem: 30335
[21:05:28.413387] Epoch: [177]  [2000/6672]  eta: 1:00:50  lr: 0.000046  loss: 2.0146 (1.9136)  time: 0.7385  data: 0.0003  max mem: 30335
[21:30:52.543318] Epoch: [177]  [4000/6672]  eta: 0:34:21  lr: 0.000044  loss: 1.8311 (1.9149)  time: 0.7230  data: 0.0002  max mem: 30335
[21:56:33.025993] Epoch: [177]  [6000/6672]  eta: 0:08:38  lr: 0.000043  loss: 1.9316 (1.9140)  time: 0.7670  data: 0.0003  max mem: 30335
[22:05:09.769681] Epoch: [177]  [6671/6672]  eta: 0:00:00  lr: 0.000043  loss: 1.9139 (1.9143)  time: 0.7192  data: 0.0010  max mem: 30335
[22:05:10.520924] Epoch: [177] Total time: 1:25:45 (0.7713 s / it)
[22:05:10.572962] Averaged stats: lr: 0.000043  loss: 1.9139 (1.9179)
[22:05:19.152593] Test:  [   0/2084]  eta: 4:57:19  loss: 0.3232 (0.3232)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 8.5601  data: 5.5331  max mem: 30335
[22:07:27.338243] Test:  [ 500/2084]  eta: 0:07:12  loss: 0.5139 (0.4632)  acc1: 83.3333 (88.2236)  acc5: 100.0000 (98.2784)  time: 0.2559  data: 0.0002  max mem: 30335
[22:09:37.063259] Test:  [1000/2084]  eta: 0:04:48  loss: 0.5890 (0.5135)  acc1: 83.3333 (86.7175)  acc5: 95.8333 (98.0811)  time: 0.2556  data: 0.0002  max mem: 30335
[22:11:45.751903] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4152 (0.5799)  acc1: 91.6667 (85.0711)  acc5: 95.8333 (97.4795)  time: 0.2560  data: 0.0002  max mem: 30335
[22:13:56.320400] Test:  [2000/2084]  eta: 0:00:22  loss: 0.2236 (0.6182)  acc1: 91.6667 (84.1017)  acc5: 100.0000 (97.0744)  time: 0.2560  data: 0.0002  max mem: 30335
[22:14:17.381817] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2278 (0.6219)  acc1: 95.8333 (84.0080)  acc5: 100.0000 (97.0660)  time: 0.2470  data: 0.0001  max mem: 30335
[22:14:17.492760] Test: Total time: 0:09:06 (0.2624 s / it)
[22:14:32.184841] * Acc@1 83.972 Acc@5 97.056 loss 0.622
[22:14:32.185185] Accuracy of the network on the 50000 test images: 84.0%
[22:14:32.185234] Max accuracy: 83.98%
[22:14:32.657749] log_dir: ./output_dir_qkformer
[22:14:42.726598] Epoch: [178]  [   0/6672]  eta: 18:39:01  lr: 0.000043  loss: 1.5874 (1.5874)  time: 10.0631  data: 2.7855  max mem: 30335
[22:40:24.776438] Epoch: [178]  [2000/6672]  eta: 1:00:23  lr: 0.000042  loss: 1.7797 (1.9094)  time: 0.7334  data: 0.0003  max mem: 30335
[23:05:41.767414] Epoch: [178]  [4000/6672]  eta: 0:34:09  lr: 0.000041  loss: 1.8918 (1.9150)  time: 0.7210  data: 0.0002  max mem: 30335
[23:31:29.158866] Epoch: [178]  [6000/6672]  eta: 0:08:36  lr: 0.000040  loss: 1.9221 (1.9118)  time: 0.7357  data: 0.0003  max mem: 30335
[23:39:57.340804] Epoch: [178]  [6671/6672]  eta: 0:00:00  lr: 0.000039  loss: 1.8267 (1.9121)  time: 0.7246  data: 0.0012  max mem: 30335
[23:39:58.192241] Epoch: [178] Total time: 1:25:25 (0.7682 s / it)
[23:39:58.246619] Averaged stats: lr: 0.000039  loss: 1.8267 (1.9111)
[23:40:03.047001] Test:  [   0/2084]  eta: 2:46:34  loss: 0.4117 (0.4117)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7958  data: 4.0996  max mem: 30335
[23:42:11.384630] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5214 (0.4593)  acc1: 83.3333 (88.2818)  acc5: 100.0000 (98.3616)  time: 0.2566  data: 0.0002  max mem: 30335
[23:44:19.255663] Test:  [1000/2084]  eta: 0:04:42  loss: 0.4265 (0.5094)  acc1: 87.5000 (86.8090)  acc5: 95.8333 (98.0436)  time: 0.2553  data: 0.0002  max mem: 30335
[23:46:28.057474] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4270 (0.5819)  acc1: 87.5000 (85.0544)  acc5: 100.0000 (97.4073)  time: 0.2564  data: 0.0002  max mem: 30335
[23:48:40.842780] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2566 (0.6201)  acc1: 95.8333 (84.1038)  acc5: 100.0000 (96.9911)  time: 0.2563  data: 0.0002  max mem: 30335
[23:49:01.900077] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3472 (0.6247)  acc1: 91.6667 (84.0100)  acc5: 100.0000 (96.9700)  time: 0.2479  data: 0.0002  max mem: 30335
[23:49:02.044779] Test: Total time: 0:09:03 (0.2609 s / it)
[23:49:16.480123] * Acc@1 83.996 Acc@5 96.965 loss 0.625
[23:49:16.480649] Accuracy of the network on the 50000 test images: 84.0%
[23:49:16.480704] Max accuracy: 84.00%
[23:49:16.915269] log_dir: ./output_dir_qkformer
[23:49:25.579734] Epoch: [179]  [   0/6672]  eta: 16:01:37  lr: 0.000039  loss: 1.5836 (1.5836)  time: 8.6477  data: 3.2826  max mem: 30335
[00:14:48.264678] Epoch: [179]  [2000/6672]  eta: 0:59:34  lr: 0.000038  loss: 1.8825 (1.9137)  time: 0.7344  data: 0.0002  max mem: 30335
[00:40:24.935030] Epoch: [179]  [4000/6672]  eta: 0:34:08  lr: 0.000037  loss: 1.9438 (1.9134)  time: 0.7229  data: 0.0002  max mem: 30335
[01:06:14.274450] Epoch: [179]  [6000/6672]  eta: 0:08:36  lr: 0.000036  loss: 1.7504 (1.9088)  time: 0.7528  data: 0.0002  max mem: 30335
[01:14:45.841041] Epoch: [179]  [6671/6672]  eta: 0:00:00  lr: 0.000036  loss: 1.7977 (1.9108)  time: 0.7233  data: 0.0011  max mem: 30335
[01:14:46.652039] Epoch: [179] Total time: 1:25:29 (0.7688 s / it)
[01:14:46.720772] Averaged stats: lr: 0.000036  loss: 1.7977 (1.9058)
[01:14:51.157584] Test:  [   0/2084]  eta: 2:33:57  loss: 0.3366 (0.3366)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4324  data: 3.6217  max mem: 30335
[01:17:00.263795] Test:  [ 500/2084]  eta: 0:07:02  loss: 0.5076 (0.4557)  acc1: 87.5000 (88.2818)  acc5: 100.0000 (98.4448)  time: 0.2562  data: 0.0002  max mem: 30335
[01:19:09.986611] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5112 (0.5099)  acc1: 87.5000 (86.6842)  acc5: 95.8333 (98.1019)  time: 0.2552  data: 0.0002  max mem: 30335
[01:21:18.456596] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4423 (0.5780)  acc1: 87.5000 (85.0044)  acc5: 100.0000 (97.4545)  time: 0.2568  data: 0.0002  max mem: 30335
[01:23:27.120235] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1865 (0.6121)  acc1: 95.8333 (84.1350)  acc5: 100.0000 (97.0890)  time: 0.2556  data: 0.0002  max mem: 30335
[01:23:48.144288] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3029 (0.6173)  acc1: 95.8333 (84.0400)  acc5: 100.0000 (97.0620)  time: 0.2474  data: 0.0002  max mem: 30335
[01:23:48.247096] Test: Total time: 0:09:01 (0.2598 s / it)
[01:24:02.808769] * Acc@1 84.066 Acc@5 97.054 loss 0.617
[01:24:02.809108] Accuracy of the network on the 50000 test images: 84.1%
[01:24:02.809146] Max accuracy: 84.07%
[01:24:03.191844] log_dir: ./output_dir_qkformer
[01:24:14.426783] Epoch: [180]  [   0/6672]  eta: 20:47:42  lr: 0.000036  loss: 1.8280 (1.8280)  time: 11.2204  data: 3.5673  max mem: 30335
[01:49:35.423001] Epoch: [180]  [2000/6672]  eta: 0:59:36  lr: 0.000035  loss: 1.7130 (1.8983)  time: 0.7225  data: 0.0002  max mem: 30335
[02:15:01.353054] Epoch: [180]  [4000/6672]  eta: 0:34:01  lr: 0.000034  loss: 1.9254 (1.8972)  time: 0.7433  data: 0.0003  max mem: 30335
[02:40:46.334450] Epoch: [180]  [6000/6672]  eta: 0:08:35  lr: 0.000033  loss: 1.9653 (1.9005)  time: 0.7233  data: 0.0003  max mem: 30335
[02:49:22.750949] Epoch: [180]  [6671/6672]  eta: 0:00:00  lr: 0.000032  loss: 1.8755 (1.9010)  time: 0.7204  data: 0.0011  max mem: 30335
[02:49:23.507124] Epoch: [180] Total time: 1:25:20 (0.7674 s / it)
[02:49:23.587217] Averaged stats: lr: 0.000032  loss: 1.8755 (1.8997)
[02:49:28.153327] Test:  [   0/2084]  eta: 2:38:26  loss: 0.4133 (0.4133)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.5619  data: 3.6786  max mem: 30335
[02:51:36.728010] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5059 (0.4550)  acc1: 83.3333 (88.2735)  acc5: 100.0000 (98.3533)  time: 0.2554  data: 0.0002  max mem: 30335
[02:53:47.122786] Test:  [1000/2084]  eta: 0:04:45  loss: 0.4385 (0.5068)  acc1: 87.5000 (86.6633)  acc5: 95.8333 (98.0811)  time: 0.2563  data: 0.0002  max mem: 30335
[02:55:55.428981] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4018 (0.5790)  acc1: 91.6667 (84.9850)  acc5: 100.0000 (97.3934)  time: 0.2548  data: 0.0002  max mem: 30335
[02:58:03.690651] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2370 (0.6162)  acc1: 91.6667 (84.1350)  acc5: 100.0000 (97.0265)  time: 0.2551  data: 0.0002  max mem: 30335
[02:58:24.720029] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3708 (0.6195)  acc1: 91.6667 (84.0320)  acc5: 100.0000 (97.0320)  time: 0.2467  data: 0.0001  max mem: 30335
[02:58:24.838310] Test: Total time: 0:09:01 (0.2597 s / it)
[02:58:39.263247] * Acc@1 84.022 Acc@5 97.036 loss 0.620
[02:58:39.263566] Accuracy of the network on the 50000 test images: 84.0%
[02:58:39.263606] Max accuracy: 84.07%
[02:58:39.495400] log_dir: ./output_dir_qkformer
[02:58:51.385655] Epoch: [181]  [   0/6672]  eta: 21:54:51  lr: 0.000032  loss: 1.4818 (1.4818)  time: 11.8243  data: 3.0006  max mem: 30335
[03:24:10.299262] Epoch: [181]  [2000/6672]  eta: 0:59:33  lr: 0.000031  loss: 1.9684 (1.9003)  time: 0.7245  data: 0.0002  max mem: 30335
[03:49:31.662654] Epoch: [181]  [4000/6672]  eta: 0:33:57  lr: 0.000030  loss: 1.8956 (1.8976)  time: 0.7911  data: 0.0003  max mem: 30335
[04:14:57.440022] Epoch: [181]  [6000/6672]  eta: 0:08:32  lr: 0.000029  loss: 1.7999 (1.8997)  time: 0.7217  data: 0.0002  max mem: 30335
[04:23:25.014148] Epoch: [181]  [6671/6672]  eta: 0:00:00  lr: 0.000029  loss: 1.8996 (1.9008)  time: 0.7208  data: 0.0006  max mem: 30335
[04:23:25.803977] Epoch: [181] Total time: 1:24:46 (0.7623 s / it)
[04:23:25.864801] Averaged stats: lr: 0.000029  loss: 1.8996 (1.8936)
[04:23:30.672478] Test:  [   0/2084]  eta: 2:46:51  loss: 0.3997 (0.3997)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.8038  data: 3.9113  max mem: 30335
[04:25:38.987850] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5756 (0.4628)  acc1: 83.3333 (87.9325)  acc5: 100.0000 (98.4697)  time: 0.2700  data: 0.0002  max mem: 30335
[04:27:47.358350] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4952 (0.5125)  acc1: 87.5000 (86.5052)  acc5: 95.8333 (98.1435)  time: 0.2559  data: 0.0002  max mem: 30335
[04:29:55.510855] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4580 (0.5820)  acc1: 87.5000 (84.8740)  acc5: 100.0000 (97.4822)  time: 0.2557  data: 0.0002  max mem: 30335
[04:32:05.141095] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2112 (0.6195)  acc1: 95.8333 (84.0288)  acc5: 100.0000 (97.0744)  time: 0.2552  data: 0.0002  max mem: 30335
[04:32:26.223305] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3750 (0.6234)  acc1: 91.6667 (83.9400)  acc5: 100.0000 (97.0580)  time: 0.2485  data: 0.0001  max mem: 30335
[04:32:26.326259] Test: Total time: 0:09:00 (0.2593 s / it)
[04:32:40.946221] * Acc@1 83.948 Acc@5 97.046 loss 0.623
[04:32:40.946463] Accuracy of the network on the 50000 test images: 83.9%
[04:32:40.946498] Max accuracy: 84.07%
[04:32:41.658554] log_dir: ./output_dir_qkformer
[04:32:48.995951] Epoch: [182]  [   0/6672]  eta: 13:35:45  lr: 0.000029  loss: 1.6919 (1.6919)  time: 7.3360  data: 2.3089  max mem: 30335
[04:58:05.536224] Epoch: [182]  [2000/6672]  eta: 0:59:17  lr: 0.000028  loss: 1.8108 (1.8837)  time: 0.7255  data: 0.0002  max mem: 30335
[05:23:16.253918] Epoch: [182]  [4000/6672]  eta: 0:33:46  lr: 0.000027  loss: 1.9730 (1.8887)  time: 0.7219  data: 0.0002  max mem: 30335
[05:48:37.665392] Epoch: [182]  [6000/6672]  eta: 0:08:30  lr: 0.000026  loss: 1.7463 (1.8862)  time: 0.7227  data: 0.0002  max mem: 30335
[05:57:11.678497] Epoch: [182]  [6671/6672]  eta: 0:00:00  lr: 0.000026  loss: 1.8673 (1.8858)  time: 0.7180  data: 0.0010  max mem: 30335
[05:57:12.480647] Epoch: [182] Total time: 1:24:30 (0.7600 s / it)
[05:57:12.541416] Averaged stats: lr: 0.000026  loss: 1.8673 (1.8863)
[05:57:16.370189] Test:  [   0/2084]  eta: 2:12:47  loss: 0.3583 (0.3583)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.8233  data: 3.1959  max mem: 30335
[05:59:24.818968] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.5180 (0.4567)  acc1: 83.3333 (88.1653)  acc5: 100.0000 (98.4697)  time: 0.2563  data: 0.0002  max mem: 30335
[06:01:32.630479] Test:  [1000/2084]  eta: 0:04:41  loss: 0.5325 (0.5075)  acc1: 91.6667 (86.6842)  acc5: 95.8333 (98.0894)  time: 0.2567  data: 0.0002  max mem: 30335
[06:03:41.258188] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4254 (0.5797)  acc1: 87.5000 (85.0294)  acc5: 100.0000 (97.3656)  time: 0.2554  data: 0.0002  max mem: 30335
[06:05:49.207019] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2292 (0.6142)  acc1: 95.8333 (84.1892)  acc5: 100.0000 (97.0411)  time: 0.2556  data: 0.0002  max mem: 30335
[06:06:12.526640] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3800 (0.6196)  acc1: 91.6667 (84.0660)  acc5: 100.0000 (97.0200)  time: 0.2472  data: 0.0001  max mem: 30335
[06:06:12.644393] Test: Total time: 0:09:00 (0.2592 s / it)
[06:06:27.180402] * Acc@1 84.064 Acc@5 97.014 loss 0.620
[06:06:27.180699] Accuracy of the network on the 50000 test images: 84.1%
[06:06:27.180749] Max accuracy: 84.07%
[06:06:27.617021] log_dir: ./output_dir_qkformer
[06:06:31.477660] Epoch: [183]  [   0/6672]  eta: 7:09:09  lr: 0.000026  loss: 1.8127 (1.8127)  time: 3.8594  data: 2.8363  max mem: 30335
[06:32:07.664953] Epoch: [183]  [2000/6672]  eta: 0:59:55  lr: 0.000025  loss: 1.6900 (1.8756)  time: 0.7615  data: 0.0003  max mem: 30335
[06:57:31.823186] Epoch: [183]  [4000/6672]  eta: 0:34:05  lr: 0.000024  loss: 1.8136 (1.8695)  time: 0.8928  data: 0.0058  max mem: 30335
[07:22:58.347833] Epoch: [183]  [6000/6672]  eta: 0:08:33  lr: 0.000024  loss: 1.7556 (1.8734)  time: 0.7618  data: 0.0003  max mem: 30335
[07:31:23.878327] Epoch: [183]  [6671/6672]  eta: 0:00:00  lr: 0.000023  loss: 1.7734 (1.8749)  time: 0.7184  data: 0.0006  max mem: 30335
[07:31:24.667226] Epoch: [183] Total time: 1:24:57 (0.7639 s / it)
[07:31:24.688415] Averaged stats: lr: 0.000023  loss: 1.7734 (1.8832)
[07:31:29.309747] Test:  [   0/2084]  eta: 2:40:21  loss: 0.4360 (0.4360)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6167  data: 3.7213  max mem: 30335
[07:33:37.717956] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5502 (0.4577)  acc1: 83.3333 (88.5479)  acc5: 100.0000 (98.3533)  time: 0.2559  data: 0.0002  max mem: 30335
[07:35:46.826232] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4511 (0.5086)  acc1: 91.6667 (86.8923)  acc5: 95.8333 (98.0311)  time: 0.2559  data: 0.0002  max mem: 30335
[07:37:56.082730] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4272 (0.5799)  acc1: 87.5000 (85.0877)  acc5: 95.8333 (97.4073)  time: 0.2559  data: 0.0002  max mem: 30335
[07:40:04.138792] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1843 (0.6160)  acc1: 95.8333 (84.1288)  acc5: 100.0000 (97.0556)  time: 0.2547  data: 0.0002  max mem: 30335
[07:40:25.207606] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2536 (0.6208)  acc1: 91.6667 (84.0000)  acc5: 100.0000 (97.0420)  time: 0.2473  data: 0.0001  max mem: 30335
[07:40:25.330168] Test: Total time: 0:09:00 (0.2594 s / it)
[07:40:39.953904] * Acc@1 84.026 Acc@5 97.034 loss 0.621
[07:40:39.954225] Accuracy of the network on the 50000 test images: 84.0%
[07:40:39.954280] Max accuracy: 84.07%
[07:40:40.369033] log_dir: ./output_dir_qkformer
[07:40:47.489676] Epoch: [184]  [   0/6672]  eta: 13:10:41  lr: 0.000023  loss: 1.7851 (1.7851)  time: 7.1106  data: 2.6127  max mem: 30335
[08:06:17.634222] Epoch: [184]  [2000/6672]  eta: 0:59:48  lr: 0.000022  loss: 1.8510 (1.8738)  time: 0.7237  data: 0.0002  max mem: 30335
[08:31:56.762011] Epoch: [184]  [4000/6672]  eta: 0:34:14  lr: 0.000022  loss: 1.9095 (1.8790)  time: 0.8034  data: 0.0003  max mem: 30335
[08:57:21.946281] Epoch: [184]  [6000/6672]  eta: 0:08:35  lr: 0.000021  loss: 1.8212 (1.8796)  time: 0.7245  data: 0.0003  max mem: 30335
[09:05:56.017459] Epoch: [184]  [6671/6672]  eta: 0:00:00  lr: 0.000021  loss: 1.7734 (1.8816)  time: 0.7172  data: 0.0007  max mem: 30335
[09:05:56.863330] Epoch: [184] Total time: 1:25:16 (0.7669 s / it)
[09:05:56.939209] Averaged stats: lr: 0.000021  loss: 1.7734 (1.8760)
[09:06:00.944304] Test:  [   0/2084]  eta: 2:18:51  loss: 0.3396 (0.3396)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.9978  data: 3.4047  max mem: 30335
[09:08:10.775269] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.5004 (0.4490)  acc1: 83.3333 (88.5978)  acc5: 100.0000 (98.4780)  time: 0.2559  data: 0.0002  max mem: 30335
[09:10:18.806008] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4728 (0.5021)  acc1: 91.6667 (86.9922)  acc5: 95.8333 (98.1144)  time: 0.2555  data: 0.0002  max mem: 30335
[09:12:29.367737] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4221 (0.5754)  acc1: 87.5000 (85.1682)  acc5: 100.0000 (97.4517)  time: 0.2550  data: 0.0002  max mem: 30335
[09:14:39.250181] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2313 (0.6123)  acc1: 95.8333 (84.2995)  acc5: 100.0000 (97.0827)  time: 0.2554  data: 0.0002  max mem: 30335
[09:15:00.290872] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2756 (0.6173)  acc1: 91.6667 (84.1940)  acc5: 100.0000 (97.0640)  time: 0.2472  data: 0.0001  max mem: 30335
[09:15:00.411772] Test: Total time: 0:09:03 (0.2608 s / it)
[09:15:14.733671] * Acc@1 84.170 Acc@5 97.066 loss 0.618
[09:15:14.733973] Accuracy of the network on the 50000 test images: 84.2%
[09:15:16.845304] Max accuracy: 84.17%
[09:15:17.178350] log_dir: ./output_dir_qkformer
[09:15:25.441764] Epoch: [185]  [   0/6672]  eta: 15:17:53  lr: 0.000021  loss: 1.8196 (1.8196)  time: 8.2545  data: 2.9977  max mem: 30335
[09:41:01.010995] Epoch: [185]  [2000/6672]  eta: 1:00:03  lr: 0.000020  loss: 1.8989 (1.8645)  time: 0.7215  data: 0.0003  max mem: 30335
[10:06:50.867452] Epoch: [185]  [4000/6672]  eta: 0:34:25  lr: 0.000019  loss: 1.8480 (1.8660)  time: 0.7193  data: 0.0003  max mem: 30335
[10:32:16.848499] Epoch: [185]  [6000/6672]  eta: 0:08:37  lr: 0.000018  loss: 1.8529 (1.8696)  time: 0.8574  data: 0.0047  max mem: 30335
[10:40:51.017358] Epoch: [185]  [6671/6672]  eta: 0:00:00  lr: 0.000018  loss: 1.9330 (1.8696)  time: 0.7238  data: 0.0008  max mem: 30335
[10:40:51.874863] Epoch: [185] Total time: 1:25:34 (0.7696 s / it)
[10:40:51.968216] Averaged stats: lr: 0.000018  loss: 1.9330 (1.8734)
[10:40:56.266632] Test:  [   0/2084]  eta: 2:28:59  loss: 0.3387 (0.3387)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.2894  data: 3.3858  max mem: 30335
[10:43:04.831440] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5220 (0.4532)  acc1: 83.3333 (88.5230)  acc5: 100.0000 (98.5446)  time: 0.2918  data: 0.0002  max mem: 30335
[10:45:13.313752] Test:  [1000/2084]  eta: 0:04:42  loss: 0.5481 (0.5063)  acc1: 87.5000 (87.0213)  acc5: 95.8333 (98.1643)  time: 0.2558  data: 0.0002  max mem: 30335
[10:47:21.219632] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4455 (0.5797)  acc1: 87.5000 (85.2848)  acc5: 95.8333 (97.4350)  time: 0.2556  data: 0.0002  max mem: 30335
[10:49:31.295410] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2153 (0.6144)  acc1: 91.6667 (84.4078)  acc5: 100.0000 (97.0702)  time: 0.2556  data: 0.0002  max mem: 30335
[10:49:52.319110] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3220 (0.6196)  acc1: 91.6667 (84.2820)  acc5: 100.0000 (97.0580)  time: 0.2478  data: 0.0001  max mem: 30335
[10:49:52.430604] Test: Total time: 0:09:00 (0.2593 s / it)
[10:50:06.736886] * Acc@1 84.257 Acc@5 97.053 loss 0.620
[10:50:06.737139] Accuracy of the network on the 50000 test images: 84.3%
[10:50:08.991937] Max accuracy: 84.26%
[10:50:09.448134] log_dir: ./output_dir_qkformer
[10:50:14.634935] Epoch: [186]  [   0/6672]  eta: 9:26:48  lr: 0.000018  loss: 1.6393 (1.6393)  time: 5.0972  data: 1.9484  max mem: 30335
[11:15:42.839548] Epoch: [186]  [2000/6672]  eta: 0:59:39  lr: 0.000017  loss: 1.8318 (1.8731)  time: 0.7329  data: 0.0003  max mem: 30335
[11:41:24.377905] Epoch: [186]  [4000/6672]  eta: 0:34:13  lr: 0.000017  loss: 1.9589 (1.8693)  time: 0.7200  data: 0.0002  max mem: 30335
[12:07:22.633469] Epoch: [186]  [6000/6672]  eta: 0:08:38  lr: 0.000016  loss: 1.8457 (1.8736)  time: 0.7250  data: 0.0003  max mem: 30335
[12:15:48.925509] Epoch: [186]  [6671/6672]  eta: 0:00:00  lr: 0.000016  loss: 1.8362 (1.8734)  time: 0.7191  data: 0.0010  max mem: 30335
[12:15:49.601350] Epoch: [186] Total time: 1:25:40 (0.7704 s / it)
[12:15:49.629435] Averaged stats: lr: 0.000016  loss: 1.8362 (1.8704)
[12:15:54.242793] Test:  [   0/2084]  eta: 2:39:14  loss: 0.4021 (0.4021)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.5845  data: 3.8096  max mem: 30335
[12:18:02.473222] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.5432 (0.4570)  acc1: 83.3333 (88.3649)  acc5: 100.0000 (98.3699)  time: 0.2561  data: 0.0002  max mem: 30335
[12:20:12.029744] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4898 (0.5067)  acc1: 87.5000 (86.7924)  acc5: 95.8333 (98.1227)  time: 0.2562  data: 0.0002  max mem: 30335
[12:22:19.957577] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4322 (0.5785)  acc1: 87.5000 (85.0516)  acc5: 100.0000 (97.4239)  time: 0.2550  data: 0.0002  max mem: 30335
[12:24:28.631177] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1948 (0.6160)  acc1: 91.6667 (84.1725)  acc5: 100.0000 (97.0556)  time: 0.2559  data: 0.0002  max mem: 30335
[12:24:49.684047] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3119 (0.6199)  acc1: 91.6667 (84.0800)  acc5: 100.0000 (97.0500)  time: 0.2480  data: 0.0001  max mem: 30335
[12:24:49.796263] Test: Total time: 0:09:00 (0.2592 s / it)
[12:25:04.189961] * Acc@1 84.086 Acc@5 97.059 loss 0.620
[12:25:04.190275] Accuracy of the network on the 50000 test images: 84.1%
[12:25:04.190318] Max accuracy: 84.26%
[12:25:04.559990] log_dir: ./output_dir_qkformer
[12:25:14.234921] Epoch: [187]  [   0/6672]  eta: 17:47:49  lr: 0.000016  loss: 1.6011 (1.6011)  time: 9.6028  data: 2.1201  max mem: 30335
[12:50:49.372166] Epoch: [187]  [2000/6672]  eta: 1:00:05  lr: 0.000015  loss: 1.8448 (1.8627)  time: 0.9684  data: 0.0003  max mem: 30335
[13:16:25.276271] Epoch: [187]  [4000/6672]  eta: 0:34:16  lr: 0.000014  loss: 1.8296 (1.8652)  time: 0.7226  data: 0.0003  max mem: 30335
[13:41:58.163765] Epoch: [187]  [6000/6672]  eta: 0:08:36  lr: 0.000014  loss: 2.0403 (1.8631)  time: 0.7474  data: 0.0003  max mem: 30335
[13:50:33.068267] Epoch: [187]  [6671/6672]  eta: 0:00:00  lr: 0.000014  loss: 1.7571 (1.8629)  time: 0.7252  data: 0.0006  max mem: 30335
[13:50:33.881742] Epoch: [187] Total time: 1:25:29 (0.7688 s / it)
[13:50:33.966888] Averaged stats: lr: 0.000014  loss: 1.7571 (1.8639)
[13:50:38.470903] Test:  [   0/2084]  eta: 2:36:16  loss: 0.3765 (0.3765)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4992  data: 3.5717  max mem: 30335
[13:52:47.008263] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5127 (0.4521)  acc1: 83.3333 (88.5645)  acc5: 100.0000 (98.4697)  time: 0.2548  data: 0.0002  max mem: 30335
[13:54:56.326057] Test:  [1000/2084]  eta: 0:04:44  loss: 0.4727 (0.5035)  acc1: 87.5000 (86.8423)  acc5: 95.8333 (98.2018)  time: 0.2561  data: 0.0002  max mem: 30335
[13:57:04.544230] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4144 (0.5735)  acc1: 87.5000 (85.1460)  acc5: 100.0000 (97.5322)  time: 0.2560  data: 0.0002  max mem: 30335
[13:59:14.469064] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2519 (0.6110)  acc1: 95.8333 (84.3308)  acc5: 100.0000 (97.0973)  time: 0.2564  data: 0.0002  max mem: 30335
[13:59:35.542167] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3369 (0.6162)  acc1: 91.6667 (84.2100)  acc5: 100.0000 (97.0720)  time: 0.2479  data: 0.0002  max mem: 30335
[13:59:35.668633] Test: Total time: 0:09:01 (0.2599 s / it)
[13:59:50.027378] * Acc@1 84.205 Acc@5 97.080 loss 0.616
[13:59:50.027907] Accuracy of the network on the 50000 test images: 84.2%
[13:59:50.028018] Max accuracy: 84.26%
[13:59:50.377428] log_dir: ./output_dir_qkformer
[13:59:59.138283] Epoch: [188]  [   0/6672]  eta: 16:04:39  lr: 0.000014  loss: 1.9126 (1.9126)  time: 8.6750  data: 2.5941  max mem: 30335
[14:25:36.811899] Epoch: [188]  [2000/6672]  eta: 1:00:09  lr: 0.000013  loss: 1.8023 (1.8540)  time: 0.7277  data: 0.0003  max mem: 30335
[14:51:16.426401] Epoch: [188]  [4000/6672]  eta: 0:34:20  lr: 0.000012  loss: 1.8042 (1.8577)  time: 0.7209  data: 0.0003  max mem: 30335
[15:16:26.532183] Epoch: [188]  [6000/6672]  eta: 0:08:34  lr: 0.000012  loss: 1.7806 (1.8581)  time: 0.7891  data: 0.0047  max mem: 30335
[15:24:50.651327] Epoch: [188]  [6671/6672]  eta: 0:00:00  lr: 0.000012  loss: 1.8978 (1.8578)  time: 0.7181  data: 0.0011  max mem: 30335
[15:24:51.515505] Epoch: [188] Total time: 1:25:01 (0.7646 s / it)
[15:24:51.561615] Averaged stats: lr: 0.000012  loss: 1.8978 (1.8609)
[15:24:56.321935] Test:  [   0/2084]  eta: 2:45:11  loss: 0.4255 (0.4255)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7561  data: 3.7053  max mem: 30335
[15:27:04.862062] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5507 (0.4506)  acc1: 83.3333 (88.6560)  acc5: 100.0000 (98.4697)  time: 0.2566  data: 0.0002  max mem: 30335
[15:29:14.851326] Test:  [1000/2084]  eta: 0:04:45  loss: 0.5311 (0.5065)  acc1: 87.5000 (86.9714)  acc5: 95.8333 (98.1102)  time: 0.2554  data: 0.0002  max mem: 30335
[15:31:24.981424] Test:  [1500/2084]  eta: 0:02:33  loss: 0.4429 (0.5787)  acc1: 87.5000 (85.1793)  acc5: 100.0000 (97.4350)  time: 0.2553  data: 0.0002  max mem: 30335
[15:33:32.867684] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2169 (0.6146)  acc1: 95.8333 (84.3453)  acc5: 100.0000 (97.0577)  time: 0.2558  data: 0.0002  max mem: 30335
[15:33:53.919352] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3801 (0.6191)  acc1: 91.6667 (84.2100)  acc5: 100.0000 (97.0400)  time: 0.2472  data: 0.0001  max mem: 30335
[15:33:54.041418] Test: Total time: 0:09:02 (0.2603 s / it)
[15:34:08.783810] * Acc@1 84.209 Acc@5 97.030 loss 0.619
[15:34:08.784306] Accuracy of the network on the 50000 test images: 84.2%
[15:34:08.784356] Max accuracy: 84.26%
[15:34:09.335346] log_dir: ./output_dir_qkformer
[15:34:20.566914] Epoch: [189]  [   0/6672]  eta: 20:47:40  lr: 0.000012  loss: 1.5511 (1.5511)  time: 11.2200  data: 3.0094  max mem: 30335
[15:59:33.362091] Epoch: [189]  [2000/6672]  eta: 0:59:17  lr: 0.000011  loss: 1.8425 (1.8392)  time: 0.7240  data: 0.0002  max mem: 30335
[16:24:50.319226] Epoch: [189]  [4000/6672]  eta: 0:33:50  lr: 0.000010  loss: 1.8638 (1.8476)  time: 0.8228  data: 0.0029  max mem: 30335
[16:50:00.430126] Epoch: [189]  [6000/6672]  eta: 0:08:29  lr: 0.000010  loss: 1.8953 (1.8497)  time: 0.7243  data: 0.0003  max mem: 30335
[16:58:22.531463] Epoch: [189]  [6671/6672]  eta: 0:00:00  lr: 0.000010  loss: 1.7898 (1.8516)  time: 0.7242  data: 0.0011  max mem: 30335
[16:58:23.396395] Epoch: [189] Total time: 1:24:14 (0.7575 s / it)
[16:58:23.441559] Averaged stats: lr: 0.000010  loss: 1.7898 (1.8571)
[16:58:27.853520] Test:  [   0/2084]  eta: 2:33:01  loss: 0.3687 (0.3687)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4059  data: 3.5293  max mem: 30335
[17:00:36.695243] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5373 (0.4540)  acc1: 87.5000 (88.4980)  acc5: 100.0000 (98.4531)  time: 0.2554  data: 0.0002  max mem: 30335
[17:02:45.207063] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5160 (0.5051)  acc1: 87.5000 (86.9506)  acc5: 95.8333 (98.1935)  time: 0.2574  data: 0.0002  max mem: 30335
[17:04:54.701377] Test:  [1500/2084]  eta: 0:02:32  loss: 0.3895 (0.5763)  acc1: 91.6667 (85.1877)  acc5: 100.0000 (97.5405)  time: 0.2549  data: 0.0002  max mem: 30335
[17:07:03.004619] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1999 (0.6153)  acc1: 91.6667 (84.2246)  acc5: 100.0000 (97.1556)  time: 0.2551  data: 0.0002  max mem: 30335
[17:07:24.020632] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3670 (0.6203)  acc1: 91.6667 (84.1120)  acc5: 100.0000 (97.1280)  time: 0.2471  data: 0.0001  max mem: 30335
[17:07:24.134241] Test: Total time: 0:09:00 (0.2594 s / it)
[17:07:39.414170] * Acc@1 84.121 Acc@5 97.127 loss 0.620
[17:07:39.414624] Accuracy of the network on the 50000 test images: 84.1%
[17:07:39.414668] Max accuracy: 84.26%
[17:07:39.878863] log_dir: ./output_dir_qkformer
[17:07:51.439814] Epoch: [190]  [   0/6672]  eta: 21:24:31  lr: 0.000010  loss: 2.1854 (2.1854)  time: 11.5514  data: 2.4440  max mem: 30335
[17:33:04.198873] Epoch: [190]  [2000/6672]  eta: 0:59:18  lr: 0.000009  loss: 1.8104 (1.8518)  time: 0.7265  data: 0.0002  max mem: 30335
[17:58:27.246838] Epoch: [190]  [4000/6672]  eta: 0:33:54  lr: 0.000009  loss: 1.8145 (1.8550)  time: 0.7407  data: 0.0007  max mem: 30335
[18:23:30.657608] Epoch: [190]  [6000/6672]  eta: 0:08:29  lr: 0.000008  loss: 1.8419 (1.8546)  time: 0.7242  data: 0.0002  max mem: 30335
[18:31:54.174284] Epoch: [190]  [6671/6672]  eta: 0:00:00  lr: 0.000008  loss: 1.7971 (1.8554)  time: 0.7203  data: 0.0012  max mem: 30335
[18:31:54.893642] Epoch: [190] Total time: 1:24:15 (0.7576 s / it)
[18:31:54.951742] Averaged stats: lr: 0.000008  loss: 1.7971 (1.8557)
[18:31:59.563614] Test:  [   0/2084]  eta: 2:39:59  loss: 0.2986 (0.2986)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.6061  data: 3.8014  max mem: 30335
[18:34:07.719294] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.6043 (0.4534)  acc1: 83.3333 (88.5479)  acc5: 100.0000 (98.4614)  time: 0.2554  data: 0.0002  max mem: 30335
[18:36:15.992269] Test:  [1000/2084]  eta: 0:04:42  loss: 0.5100 (0.5039)  acc1: 87.5000 (86.9630)  acc5: 95.8333 (98.1727)  time: 0.2563  data: 0.0002  max mem: 30335
[18:38:24.053961] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4710 (0.5771)  acc1: 87.5000 (85.1349)  acc5: 100.0000 (97.4350)  time: 0.2552  data: 0.0002  max mem: 30335
[18:40:32.566967] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2181 (0.6121)  acc1: 91.6667 (84.3037)  acc5: 100.0000 (97.0869)  time: 0.2561  data: 0.0002  max mem: 30335
[18:40:53.582783] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2925 (0.6169)  acc1: 91.6667 (84.2040)  acc5: 100.0000 (97.0760)  time: 0.2469  data: 0.0001  max mem: 30335
[18:40:53.691130] Test: Total time: 0:08:58 (0.2585 s / it)
[18:41:08.662066] * Acc@1 84.220 Acc@5 97.065 loss 0.617
[18:41:08.662401] Accuracy of the network on the 50000 test images: 84.2%
[18:41:08.662440] Max accuracy: 84.26%
[18:41:09.062349] log_dir: ./output_dir_qkformer
[18:41:13.365208] Epoch: [191]  [   0/6672]  eta: 7:58:22  lr: 0.000008  loss: 2.4326 (2.4326)  time: 4.3020  data: 2.3672  max mem: 30335
[19:06:40.605122] Epoch: [191]  [2000/6672]  eta: 0:59:35  lr: 0.000008  loss: 1.8301 (1.8459)  time: 0.7253  data: 0.0003  max mem: 30335
[19:31:42.655802] Epoch: [191]  [4000/6672]  eta: 0:33:45  lr: 0.000007  loss: 1.8024 (1.8484)  time: 0.7722  data: 0.0002  max mem: 30335
[19:56:57.977003] Epoch: [191]  [6000/6672]  eta: 0:08:29  lr: 0.000007  loss: 1.8386 (1.8530)  time: 0.7217  data: 0.0003  max mem: 30335
[20:05:23.036612] Epoch: [191]  [6671/6672]  eta: 0:00:00  lr: 0.000007  loss: 1.7187 (1.8520)  time: 0.7185  data: 0.0011  max mem: 30335
[20:05:23.784345] Epoch: [191] Total time: 1:24:14 (0.7576 s / it)
[20:05:23.848723] Averaged stats: lr: 0.000007  loss: 1.7187 (1.8521)
[20:05:28.333174] Test:  [   0/2084]  eta: 2:35:33  loss: 0.3731 (0.3731)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.4787  data: 3.4324  max mem: 30335
[20:07:37.295411] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5623 (0.4489)  acc1: 87.5000 (88.6311)  acc5: 100.0000 (98.5113)  time: 0.2551  data: 0.0002  max mem: 30335
[20:09:46.025472] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4610 (0.5006)  acc1: 91.6667 (87.0172)  acc5: 95.8333 (98.2143)  time: 0.2563  data: 0.0002  max mem: 30335
[20:11:54.952296] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4077 (0.5712)  acc1: 87.5000 (85.3126)  acc5: 100.0000 (97.5044)  time: 0.2561  data: 0.0002  max mem: 30335
[20:14:03.947178] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2301 (0.6081)  acc1: 95.8333 (84.4120)  acc5: 100.0000 (97.1368)  time: 0.2559  data: 0.0002  max mem: 30335
[20:14:26.194389] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3418 (0.6129)  acc1: 91.6667 (84.3060)  acc5: 100.0000 (97.1120)  time: 0.2475  data: 0.0001  max mem: 30335
[20:14:26.311440] Test: Total time: 0:09:02 (0.2603 s / it)
[20:14:40.551878] * Acc@1 84.293 Acc@5 97.107 loss 0.613
[20:14:40.552370] Accuracy of the network on the 50000 test images: 84.3%
[20:14:42.624282] Max accuracy: 84.29%
[20:14:42.960888] log_dir: ./output_dir_qkformer
[20:14:50.325938] Epoch: [192]  [   0/6672]  eta: 13:24:40  lr: 0.000007  loss: 1.9811 (1.9811)  time: 7.2363  data: 1.9122  max mem: 30335
[20:40:00.143294] Epoch: [192]  [2000/6672]  eta: 0:59:01  lr: 0.000006  loss: 1.7818 (1.8467)  time: 0.7864  data: 0.0002  max mem: 30335
[21:05:19.619431] Epoch: [192]  [4000/6672]  eta: 0:33:47  lr: 0.000006  loss: 1.8931 (1.8431)  time: 0.7311  data: 0.0003  max mem: 30335
[21:30:36.931245] Epoch: [192]  [6000/6672]  eta: 0:08:29  lr: 0.000005  loss: 1.7809 (1.8439)  time: 0.7732  data: 0.0002  max mem: 30335
[21:39:03.208616] Epoch: [192]  [6671/6672]  eta: 0:00:00  lr: 0.000005  loss: 1.9029 (1.8467)  time: 0.7188  data: 0.0010  max mem: 30335
[21:39:03.972074] Epoch: [192] Total time: 1:24:21 (0.7585 s / it)
[21:39:04.023059] Averaged stats: lr: 0.000005  loss: 1.9029 (1.8493)
[21:39:08.405221] Test:  [   0/2084]  eta: 2:32:03  loss: 0.2780 (0.2780)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.3781  data: 3.6216  max mem: 30335
[21:41:17.183517] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.5387 (0.4570)  acc1: 83.3333 (88.3483)  acc5: 100.0000 (98.4697)  time: 0.2557  data: 0.0002  max mem: 30335
[21:43:26.346168] Test:  [1000/2084]  eta: 0:04:44  loss: 0.5198 (0.5106)  acc1: 87.5000 (86.7924)  acc5: 95.8333 (98.1477)  time: 0.2550  data: 0.0002  max mem: 30335
[21:45:34.472142] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4190 (0.5822)  acc1: 87.5000 (85.0572)  acc5: 100.0000 (97.4684)  time: 0.2552  data: 0.0002  max mem: 30335
[21:47:42.312748] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1599 (0.6176)  acc1: 95.8333 (84.2870)  acc5: 100.0000 (97.1160)  time: 0.2555  data: 0.0002  max mem: 30335
[21:48:03.368527] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3756 (0.6220)  acc1: 91.6667 (84.1960)  acc5: 100.0000 (97.0900)  time: 0.2479  data: 0.0002  max mem: 30335
[21:48:03.482832] Test: Total time: 0:08:59 (0.2589 s / it)
[21:48:18.241396] * Acc@1 84.172 Acc@5 97.095 loss 0.622
[21:48:18.241618] Accuracy of the network on the 50000 test images: 84.2%
[21:48:18.241664] Max accuracy: 84.29%
[21:48:18.354673] log_dir: ./output_dir_qkformer
[21:48:29.362587] Epoch: [193]  [   0/6672]  eta: 20:22:25  lr: 0.000005  loss: 1.6968 (1.6968)  time: 10.9930  data: 2.7972  max mem: 30335
[22:13:39.169191] Epoch: [193]  [2000/6672]  eta: 0:59:09  lr: 0.000005  loss: 1.7752 (1.8461)  time: 0.7277  data: 0.0003  max mem: 30335
[22:38:59.283732] Epoch: [193]  [4000/6672]  eta: 0:33:50  lr: 0.000005  loss: 1.7373 (1.8477)  time: 0.8291  data: 0.0073  max mem: 30335
[23:04:09.528103] Epoch: [193]  [6000/6672]  eta: 0:08:29  lr: 0.000004  loss: 1.8676 (1.8498)  time: 0.8579  data: 0.0045  max mem: 30335
[23:12:29.314796] Epoch: [193]  [6671/6672]  eta: 0:00:00  lr: 0.000004  loss: 1.8146 (1.8481)  time: 0.7193  data: 0.0006  max mem: 30335
[23:12:30.194673] Epoch: [193] Total time: 1:24:11 (0.7572 s / it)
[23:12:30.236028] Averaged stats: lr: 0.000004  loss: 1.8146 (1.8474)
[23:12:34.997803] Test:  [   0/2084]  eta: 2:45:14  loss: 0.3645 (0.3645)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.7575  data: 4.1304  max mem: 30335
[23:14:43.064363] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.5417 (0.4556)  acc1: 83.3333 (88.8057)  acc5: 100.0000 (98.4032)  time: 0.2562  data: 0.0002  max mem: 30335
[23:16:52.066707] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5376 (0.5061)  acc1: 87.5000 (86.9922)  acc5: 95.8333 (98.1019)  time: 0.2556  data: 0.0002  max mem: 30335
[23:19:00.435868] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4842 (0.5763)  acc1: 87.5000 (85.2293)  acc5: 100.0000 (97.4295)  time: 0.2555  data: 0.0002  max mem: 30335
[23:21:08.665063] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2298 (0.6109)  acc1: 95.8333 (84.3974)  acc5: 100.0000 (97.1389)  time: 0.2542  data: 0.0002  max mem: 30335
[23:21:29.706388] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3486 (0.6154)  acc1: 91.6667 (84.2880)  acc5: 100.0000 (97.1160)  time: 0.2476  data: 0.0001  max mem: 30335
[23:21:29.834481] Test: Total time: 0:08:59 (0.2589 s / it)
[23:21:44.154046] * Acc@1 84.262 Acc@5 97.115 loss 0.616
[23:21:44.154310] Accuracy of the network on the 50000 test images: 84.3%
[23:21:44.154352] Max accuracy: 84.29%
[23:21:44.449232] log_dir: ./output_dir_qkformer
[23:21:56.851137] Epoch: [194]  [   0/6672]  eta: 22:58:59  lr: 0.000004  loss: 2.0116 (2.0116)  time: 12.4010  data: 2.5514  max mem: 30335
[23:47:10.878924] Epoch: [194]  [2000/6672]  eta: 0:59:23  lr: 0.000004  loss: 1.9746 (1.8560)  time: 0.7246  data: 0.0003  max mem: 30335
[00:12:16.655368] Epoch: [194]  [4000/6672]  eta: 0:33:44  lr: 0.000004  loss: 1.8222 (1.8550)  time: 0.7572  data: 0.0003  max mem: 30335
[00:37:25.614083] Epoch: [194]  [6000/6672]  eta: 0:08:28  lr: 0.000003  loss: 1.7382 (1.8506)  time: 0.7871  data: 0.0003  max mem: 30335
[00:45:49.066895] Epoch: [194]  [6671/6672]  eta: 0:00:00  lr: 0.000003  loss: 1.7053 (1.8499)  time: 0.7188  data: 0.0006  max mem: 30335
[00:45:49.890449] Epoch: [194] Total time: 1:24:05 (0.7562 s / it)
[00:45:49.926841] Averaged stats: lr: 0.000003  loss: 1.7053 (1.8465)
[00:45:53.899313] Test:  [   0/2084]  eta: 2:17:47  loss: 0.3535 (0.3535)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.9671  data: 3.2126  max mem: 30335
[00:48:02.785287] Test:  [ 500/2084]  eta: 0:06:59  loss: 0.5564 (0.4502)  acc1: 79.1667 (88.5146)  acc5: 100.0000 (98.4531)  time: 0.2560  data: 0.0002  max mem: 30335
[00:50:11.598438] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4794 (0.5017)  acc1: 87.5000 (87.0296)  acc5: 95.8333 (98.2351)  time: 0.2564  data: 0.0002  max mem: 30335
[00:52:21.113432] Test:  [1500/2084]  eta: 0:02:32  loss: 0.3907 (0.5748)  acc1: 87.5000 (85.2626)  acc5: 100.0000 (97.4989)  time: 0.2556  data: 0.0002  max mem: 30335
[00:54:28.919321] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1785 (0.6119)  acc1: 95.8333 (84.3786)  acc5: 100.0000 (97.1015)  time: 0.2566  data: 0.0002  max mem: 30335
[00:54:50.547023] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3018 (0.6166)  acc1: 91.6667 (84.2420)  acc5: 100.0000 (97.0840)  time: 0.2469  data: 0.0001  max mem: 30335
[00:54:50.659518] Test: Total time: 0:09:00 (0.2595 s / it)
[00:55:04.905653] * Acc@1 84.253 Acc@5 97.072 loss 0.617
[00:55:04.905885] Accuracy of the network on the 50000 test images: 84.3%
[00:55:04.905921] Max accuracy: 84.29%
[00:55:05.200567] log_dir: ./output_dir_qkformer
[00:55:13.739493] Epoch: [195]  [   0/6672]  eta: 15:47:52  lr: 0.000003  loss: 1.8177 (1.8177)  time: 8.5240  data: 2.1717  max mem: 30335
[01:20:37.897414] Epoch: [195]  [2000/6672]  eta: 0:59:37  lr: 0.000003  loss: 1.7828 (1.8265)  time: 0.7433  data: 0.0003  max mem: 30335
[01:45:40.343645] Epoch: [195]  [4000/6672]  eta: 0:33:46  lr: 0.000003  loss: 1.9009 (1.8408)  time: 0.7224  data: 0.0003  max mem: 30335
[02:10:55.888660] Epoch: [195]  [6000/6672]  eta: 0:08:29  lr: 0.000002  loss: 1.9118 (1.8401)  time: 0.7286  data: 0.0003  max mem: 30335
[02:19:24.837944] Epoch: [195]  [6671/6672]  eta: 0:00:00  lr: 0.000002  loss: 1.6580 (1.8407)  time: 0.7201  data: 0.0010  max mem: 30335
[02:19:25.615305] Epoch: [195] Total time: 1:24:20 (0.7585 s / it)
[02:19:25.680291] Averaged stats: lr: 0.000002  loss: 1.6580 (1.8445)
[02:19:29.874479] Test:  [   0/2084]  eta: 2:25:30  loss: 0.3837 (0.3837)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.1891  data: 3.1410  max mem: 30335
[02:21:38.938364] Test:  [ 500/2084]  eta: 0:07:01  loss: 0.5334 (0.4516)  acc1: 83.3333 (88.6144)  acc5: 100.0000 (98.4864)  time: 0.2560  data: 0.0002  max mem: 30335
[02:23:47.330521] Test:  [1000/2084]  eta: 0:04:43  loss: 0.5386 (0.5030)  acc1: 83.3333 (86.9214)  acc5: 95.8333 (98.2018)  time: 0.2564  data: 0.0003  max mem: 30335
[02:25:55.471168] Test:  [1500/2084]  eta: 0:02:31  loss: 0.4646 (0.5734)  acc1: 87.5000 (85.1571)  acc5: 95.8333 (97.4906)  time: 0.2559  data: 0.0002  max mem: 30335
[02:28:04.214278] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2008 (0.6094)  acc1: 91.6667 (84.2808)  acc5: 100.0000 (97.1119)  time: 0.2555  data: 0.0002  max mem: 30335
[02:28:25.290488] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3318 (0.6142)  acc1: 91.6667 (84.1740)  acc5: 100.0000 (97.0960)  time: 0.2480  data: 0.0001  max mem: 30335
[02:28:25.420392] Test: Total time: 0:08:59 (0.2590 s / it)
[02:28:39.603431] * Acc@1 84.188 Acc@5 97.103 loss 0.614
[02:28:39.603910] Accuracy of the network on the 50000 test images: 84.2%
[02:28:39.603955] Max accuracy: 84.29%
[02:28:39.961866] log_dir: ./output_dir_qkformer
[02:28:49.531923] Epoch: [196]  [   0/6672]  eta: 17:43:51  lr: 0.000002  loss: 1.6016 (1.6016)  time: 9.5671  data: 2.2276  max mem: 30335
[02:53:54.005596] Epoch: [196]  [2000/6672]  eta: 0:58:54  lr: 0.000002  loss: 1.6287 (1.8416)  time: 0.7260  data: 0.0002  max mem: 30335
[03:18:56.501821] Epoch: [196]  [4000/6672]  eta: 0:33:34  lr: 0.000002  loss: 1.6981 (1.8403)  time: 0.7237  data: 0.0003  max mem: 30335
[03:44:01.696338] Epoch: [196]  [6000/6672]  eta: 0:08:26  lr: 0.000002  loss: 1.7847 (1.8406)  time: 0.7243  data: 0.0003  max mem: 30335
[03:52:33.748115] Epoch: [196]  [6671/6672]  eta: 0:00:00  lr: 0.000002  loss: 1.7516 (1.8420)  time: 0.7233  data: 0.0010  max mem: 30335
[03:52:34.559400] Epoch: [196] Total time: 1:23:54 (0.7546 s / it)
[03:52:34.612026] Averaged stats: lr: 0.000002  loss: 1.7516 (1.8415)
[03:52:38.532027] Test:  [   0/2084]  eta: 2:15:58  loss: 0.4084 (0.4084)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 3.9151  data: 3.2346  max mem: 30335
[03:54:47.074865] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.5383 (0.4496)  acc1: 79.1667 (88.6394)  acc5: 100.0000 (98.4614)  time: 0.2559  data: 0.0002  max mem: 30335
[03:56:54.936898] Test:  [1000/2084]  eta: 0:04:41  loss: 0.4354 (0.5018)  acc1: 87.5000 (86.8423)  acc5: 95.8333 (98.2268)  time: 0.2564  data: 0.0002  max mem: 30335
[03:59:03.789124] Test:  [1500/2084]  eta: 0:02:31  loss: 0.3906 (0.5737)  acc1: 91.6667 (85.2237)  acc5: 100.0000 (97.5849)  time: 0.2599  data: 0.0002  max mem: 30335
[04:01:15.291273] Test:  [2000/2084]  eta: 0:00:21  loss: 0.1745 (0.6097)  acc1: 95.8333 (84.3162)  acc5: 100.0000 (97.1868)  time: 0.3595  data: 0.0002  max mem: 30335
[04:01:37.614920] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2761 (0.6138)  acc1: 91.6667 (84.2180)  acc5: 100.0000 (97.1740)  time: 0.2474  data: 0.0002  max mem: 30335
[04:01:37.733498] Test: Total time: 0:09:03 (0.2606 s / it)
[04:01:52.115697] * Acc@1 84.232 Acc@5 97.169 loss 0.614
[04:01:52.116217] Accuracy of the network on the 50000 test images: 84.2%
[04:01:52.116288] Max accuracy: 84.29%
[04:01:52.369274] log_dir: ./output_dir_qkformer
[04:02:02.822835] Epoch: [197]  [   0/6672]  eta: 19:20:52  lr: 0.000002  loss: 2.0942 (2.0942)  time: 10.4396  data: 2.9628  max mem: 30335
[04:27:02.006263] Epoch: [197]  [2000/6672]  eta: 0:58:44  lr: 0.000002  loss: 1.8972 (1.8439)  time: 1.0960  data: 0.0014  max mem: 30335
[04:51:54.829443] Epoch: [197]  [4000/6672]  eta: 0:33:24  lr: 0.000002  loss: 1.7677 (1.8498)  time: 0.7789  data: 0.0002  max mem: 30335
[05:17:03.623927] Epoch: [197]  [6000/6672]  eta: 0:08:25  lr: 0.000001  loss: 1.7296 (1.8489)  time: 0.7919  data: 0.0002  max mem: 30335
[05:25:29.526461] Epoch: [197]  [6671/6672]  eta: 0:00:00  lr: 0.000001  loss: 1.7640 (1.8472)  time: 0.7196  data: 0.0011  max mem: 30335
[05:25:30.266850] Epoch: [197] Total time: 1:23:37 (0.7521 s / it)
[05:25:30.350499] Averaged stats: lr: 0.000001  loss: 1.7640 (1.8436)
[05:25:35.182588] Test:  [   0/2084]  eta: 2:47:32  loss: 0.3473 (0.3473)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.8235  data: 3.6847  max mem: 30335
[05:27:44.219368] Test:  [ 500/2084]  eta: 0:07:03  loss: 0.5243 (0.4473)  acc1: 83.3333 (88.6394)  acc5: 100.0000 (98.4198)  time: 0.2559  data: 0.0002  max mem: 30335
[05:29:52.150608] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4378 (0.4998)  acc1: 87.5000 (86.8798)  acc5: 95.8333 (98.1477)  time: 0.2553  data: 0.0002  max mem: 30335
[05:32:02.232371] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4653 (0.5733)  acc1: 87.5000 (85.1654)  acc5: 100.0000 (97.5017)  time: 0.2554  data: 0.0002  max mem: 30335
[05:34:10.877645] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2578 (0.6108)  acc1: 95.8333 (84.2683)  acc5: 100.0000 (97.1015)  time: 0.2547  data: 0.0002  max mem: 30335
[05:34:31.950125] Test:  [2083/2084]  eta: 0:00:00  loss: 0.2692 (0.6144)  acc1: 91.6667 (84.1740)  acc5: 100.0000 (97.0940)  time: 0.2479  data: 0.0001  max mem: 30335
[05:34:32.069315] Test: Total time: 0:09:01 (0.2599 s / it)
[05:34:46.463258] * Acc@1 84.192 Acc@5 97.104 loss 0.614
[05:34:46.463628] Accuracy of the network on the 50000 test images: 84.2%
[05:34:46.463672] Max accuracy: 84.29%
[05:34:46.622408] log_dir: ./output_dir_qkformer
[05:34:55.463254] Epoch: [198]  [   0/6672]  eta: 16:18:29  lr: 0.000001  loss: 1.5654 (1.5654)  time: 8.7994  data: 2.3027  max mem: 30335
[06:00:08.573189] Epoch: [198]  [2000/6672]  eta: 0:59:12  lr: 0.000001  loss: 1.7647 (1.8379)  time: 0.7248  data: 0.0002  max mem: 30335
[06:25:03.662067] Epoch: [198]  [4000/6672]  eta: 0:33:34  lr: 0.000001  loss: 1.8297 (1.8419)  time: 0.7232  data: 0.0003  max mem: 30335
[06:50:08.375833] Epoch: [198]  [6000/6672]  eta: 0:08:26  lr: 0.000001  loss: 1.9467 (1.8453)  time: 0.7467  data: 0.0002  max mem: 30335
[06:58:27.977245] Epoch: [198]  [6671/6672]  eta: 0:00:00  lr: 0.000001  loss: 1.8647 (1.8436)  time: 0.7196  data: 0.0009  max mem: 30335
[06:58:28.698094] Epoch: [198] Total time: 1:23:42 (0.7527 s / it)
[06:58:28.747647] Averaged stats: lr: 0.000001  loss: 1.8647 (1.8414)
[06:58:33.596300] Test:  [   0/2084]  eta: 2:48:12  loss: 0.3583 (0.3583)  acc1: 95.8333 (95.8333)  acc5: 95.8333 (95.8333)  time: 4.8428  data: 4.0272  max mem: 30335
[07:00:41.759072] Test:  [ 500/2084]  eta: 0:07:00  loss: 0.6107 (0.4595)  acc1: 83.3333 (88.3649)  acc5: 100.0000 (98.4281)  time: 0.2552  data: 0.0002  max mem: 30335
[07:02:51.050880] Test:  [1000/2084]  eta: 0:04:44  loss: 0.6038 (0.5074)  acc1: 87.5000 (86.9048)  acc5: 95.8333 (98.1893)  time: 0.2556  data: 0.0002  max mem: 30335
[07:05:00.353699] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4081 (0.5794)  acc1: 91.6667 (85.0877)  acc5: 100.0000 (97.5155)  time: 0.2554  data: 0.0002  max mem: 30335
[07:07:08.994751] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2310 (0.6165)  acc1: 95.8333 (84.2058)  acc5: 100.0000 (97.1348)  time: 0.2557  data: 0.0002  max mem: 30335
[07:07:30.012848] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3642 (0.6209)  acc1: 91.6667 (84.0980)  acc5: 100.0000 (97.1200)  time: 0.2473  data: 0.0001  max mem: 30335
[07:07:30.125558] Test: Total time: 0:09:01 (0.2598 s / it)
[07:07:44.709653] * Acc@1 84.093 Acc@5 97.101 loss 0.621
[07:07:44.709881] Accuracy of the network on the 50000 test images: 84.1%
[07:07:44.709936] Max accuracy: 84.29%
[07:07:45.100829] log_dir: ./output_dir_qkformer
[07:07:54.408249] Epoch: [199]  [   0/6672]  eta: 17:14:50  lr: 0.000001  loss: 1.6402 (1.6402)  time: 9.3062  data: 3.3657  max mem: 30335
[07:32:59.142876] Epoch: [199]  [2000/6672]  eta: 0:58:54  lr: 0.000001  loss: 1.8381 (1.8447)  time: 0.7272  data: 0.0003  max mem: 30335
[07:58:06.729530] Epoch: [199]  [4000/6672]  eta: 0:33:37  lr: 0.000001  loss: 1.7551 (1.8403)  time: 0.7307  data: 0.0003  max mem: 30335
[08:23:11.677189] Epoch: [199]  [6000/6672]  eta: 0:08:26  lr: 0.000001  loss: 1.8319 (1.8410)  time: 0.7464  data: 0.0003  max mem: 30335
[08:31:38.551127] Epoch: [199]  [6671/6672]  eta: 0:00:00  lr: 0.000001  loss: 1.7052 (1.8405)  time: 0.7191  data: 0.0010  max mem: 30335
[08:31:39.351805] Epoch: [199] Total time: 1:23:54 (0.7545 s / it)
[08:31:39.435148] Averaged stats: lr: 0.000001  loss: 1.7052 (1.8424)
[08:31:43.829552] Test:  [   0/2084]  eta: 2:32:24  loss: 0.4108 (0.4108)  acc1: 91.6667 (91.6667)  acc5: 95.8333 (95.8333)  time: 4.3881  data: 3.3956  max mem: 30335
[08:33:51.707059] Test:  [ 500/2084]  eta: 0:06:58  loss: 0.5189 (0.4514)  acc1: 83.3333 (88.6228)  acc5: 100.0000 (98.4448)  time: 0.2563  data: 0.0002  max mem: 30335
[08:36:01.551860] Test:  [1000/2084]  eta: 0:04:43  loss: 0.4349 (0.5029)  acc1: 87.5000 (87.0005)  acc5: 95.8333 (98.1477)  time: 0.2557  data: 0.0002  max mem: 30335
[08:38:10.920095] Test:  [1500/2084]  eta: 0:02:32  loss: 0.4776 (0.5724)  acc1: 87.5000 (85.3181)  acc5: 100.0000 (97.5044)  time: 0.2555  data: 0.0002  max mem: 30335
[08:40:21.430292] Test:  [2000/2084]  eta: 0:00:21  loss: 0.2653 (0.6082)  acc1: 95.8333 (84.4349)  acc5: 100.0000 (97.1556)  time: 0.2555  data: 0.0002  max mem: 30335
[08:40:42.503194] Test:  [2083/2084]  eta: 0:00:00  loss: 0.3121 (0.6132)  acc1: 91.6667 (84.2960)  acc5: 100.0000 (97.1300)  time: 0.2483  data: 0.0001  max mem: 30335
[08:40:42.633168] Test: Total time: 0:09:03 (0.2606 s / it)
[08:40:56.788395] * Acc@1 84.266 Acc@5 97.134 loss 0.614
[08:40:56.788876] Accuracy of the network on the 50000 test images: 84.3%
[08:40:56.788932] Max accuracy: 84.29%
[08:40:57.269496] Training time 12 days, 23:07:59
