# InferBand

## Installation
```
pip install -e .
```

## Run Experiments
```
cd inferband
python run_exp.py
python plot.py
```
`.tex` files for result tables will be generated.

## Simulator Usage
Debug run on a naive request pattern
```
python -m inferband.simulator --debug
```

## Real Inference on Datasets
```
cd real_run
```
Test the accuracy of Flan-T5 models with different size on [Open Assistant dataset](https://huggingface.co/datasets/OpenAssistant/oasst1):

1. Generate 100 queries
```
python oasst.py --gen-query 100
```
2. Run inference for flan-t5-large
```
python oasst.py --device [CUSTOMIZED_DEVICE_NAME] --model flan-t5-large --query-path exp/oasst/prompts-100.json
```
3. Run evaluation
```
python oasst.py --eval-path exp/oasst/[CUSTOMIZED_DEVICE_NAME]/flan-t5-large_prompts-100_output.json
```
4. See evaluation stats
```
python stats.py --exp-dir exp/oasst/[CUSTOMIZED_DEVICE_NAME] --model flan-t5 vicuna --num-query 100
```

Test the accuracy of different models on xsum:
1. Generate 100 queries (bug exists, prompt length cannot exceeded a threshold)
```
python xsum.py --gen-query 100
```
2. Run inference for llama-7b
```
python xsum.py --device [CUSTOMIZED_DEVICE_NAME] --model [MODEL_PATH] --query-path exp/xsum/prompts-100.json
```
3. Run evaluation (suppose MODEL_PATH=~/llama-7b)
```
python xsum.py --eval-path exp/xsum/[CUSTOMIZED_DEVICE_NAME]/llama-7b_prompts-100_output.json
```
4. See gpt4 evaluation stats
```
python stats.py --exp-dir exp/xsum/[CUSTOMIZED_DEVICE_NAME] --model llama --num-query 100
```


