# InfiniteBench

```bash
cd eval
pip install -r requirements.txt
```

Run task `kv_retrieval`:
```bash
python run_infinitebench.py \
    --task kv_retrieval \
    --model_name_or_path LargeWorldModel/LWM-Text-Chat-1M \
    --data_dir ./data \
    --output_dir ./results \
    --max_seq_length 100000 \
    --rewrite \
    --num_eval_examples 100 \
    --attn_type streaming  # choose from `vllm`, `streaming`, `hf`

# 01-ai/Yi-9B-200k
```

Run multiple tasks:
```
python run_infinitebench.py \
    --task code_debug,code_run,kv_retrieval,longbook_choice_eng,longbook_qa_chn,longbook_qa_eng,longbook_sum_eng,longdialogue_qa_eng,math_calc,math_find,number_string,passkey \
    --model_name_or_path LargeWorldModel/LWM-Text-Chat-1M \
    --data_dir ./data \
    --output_dir ./results \
    --max_seq_length 100000 \
    --rewrite \
    --num_eval_examples 100
```

All task lists:
```python
datasets = [
    "code_debug", "code_run", "kv_retrieval", "longbook_choice_eng",
    "longbook_qa_chn", "longbook_qa_eng", "longbook_sum_eng",
    "longdialogue_qa_eng", "math_calc", "math_find", "number_string", "passkey"
]
```

# LongBench

```bash
cd eval
pip install -r requirements.txt
```

Run task `qasper`:
```bash
python run_longbech.py \
    --task qasper \
    --model_name_or_path LargeWorldModel/LWM-Text-Chat-1M \
    --data_dir ./data \
    --output_dir ./results \
    --max_seq_length 100000 \
    --rewrite \  # rewrite the cached prediction? if not, comment this line
    --num_eval_examples 100  # number of examples to use for evaluation
```

All task lists:
```python
datasets = [
    "narrativeqa", "qasper", "multifieldqa_en", "multifieldqa_zh", "hotpotqa", "2wikimqa", "musique",
    "dureader", "gov_report", "qmsum", "multi_news", "vcsum", "trec", "triviaqa", "samsum", "lsht", 
    "passage_count", "passage_retrieval_en", "passage_retrieval_zh", "lcc", "repobench-p"
]
```

# Needle test


1. Download the data


2. Set `context_length`, `seed`, `round` at `eval/needle_test.py`

3. Run the test

```bash
python eval/needle_test.py \
    --model_name <model> \
    --run_name <run_name> \
    --attn_type streaming # choose from `vllm`, `streaming`, `hf`
```

The result plot will be saved as `needle_viz_Yi_1K_100K.png` in the current directory.