# Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback
## NeurIPS submission 236
Official implementation of the paper: "Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback". Our implementation is based on the official codebase of [B-Pref](https://github.com/rll-research/BPref) and [Meta-Reward-Net](https://github.com/RyanLiu112/MRN).

# Installations
## Install in Ubuntu 20.04, Python3.8
```
conda create -n pbrl python=3.8
conda install cudatoolkit=11.3
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld
```

## Install Mujoco
If mujoco is not installed, install mujoco.
```
wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
mkdir ~/.mujoco && cd ~/.mujoco
tar -zxvf <mujoco210 downloaded path>
sudo apt install libglew-dev libgl-dev
echo LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGL.so:/usr/lib/x86_64-linux-gnu/libGLEW.so
# copy the last output
```

Add following lines at the end of the ~/.bashrc
```
export LD_LIBRARY_PATH=/home/<user_name>/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export PATH="$LD_LIBRARY_PATH:$PATH"
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so # paste the copied output
```

```
sudo apt update
sudo apt-get install patchelf
sudo apt-get install python3-dev build-essential libssl-dev libffi-dev libxml2-dev
sudo apt-get install libxslt1-dev zlib1g-dev libglew1.5 libglew-dev python3-pip

git clone https://github.com/openai/mujoco-py
cd mujoco-py
pip install -r requirements.txt
pip install -r requirements.dev.txt

pip3 install -e . --no-cache
pip3 install mxnet-mkl==1.6.0 numpy==1.23.1
```

```
cd custom_dmcontrol
pip install -e .
cd custom_dmc2gym
pip install -e .
pip install pybullet
```

## Modify tensorboard summarywriter make_histogram function
    cum_counts = np.cumsum(np.greater(counts, 0)) #, dtype=np.int32))
    ~/anaconda3/envs/conda_env_name/lib/python3.8/site-packages/torch/utils/tensorboard/summary.py

## Add custom camera for real human experiments
Default cameras hardly show the moved distance of the agent.
Edit ~/anaconda3/envs/conda_env_name/lib/python3.8/site-packages/dm_control/suite/cheetah.xml

Change
```
      <light name="light" pos="0 0 2" mode="trackcom"/>
      <camera name="side" pos="0 -3 0" quat="0.707 0.707 0 0" mode="trackcom"/>
      <camera name="back" pos="-1.8 -1.3 0.8" xyaxes="0.45 -0.9 0 0.3 0.15 0.94" mode="trackcom"/>
```
to 
```
      <light name="light" pos="0 0 2" mode="trackcom"/>
      <camera name="custom_cam" pos="0 -3 1" quat="0.707 0.5 0 0" mode="trackcom"/>
      <camera name="side" pos="0 -3 0" quat="0.707 0.707 0 0" mode="trackcom"/>
      <camera name="back" pos="-1.8 -1.3 0.8" xyaxes="0.45 -0.9 0 0.3 0.15 0.94" mode="trackcom"/>
```

## Gym depedendcies
Fix ~/anaconda3/envs/conda_env_name/lib/python3.8/site-packages/gym/wrappers/time_limit.py line 50.
Following example shows the case of executing dmcontrol experiments.
If you want to experiment both dmcontrol and metaworld tasks, make two conda environments and fix each environment as follows.
```
# dmcontrol
observation, reward, terminated, truncated, info = self.env.step(action)
# metaworld
# observation, reward, terminated, info = self.env.step(action)
# truncated = False
```
## How to run

### DeepMind Control Suite (Walker)
#### Train SeqRank
Modify the number of threads from 1 to 3 and set the GPU index you want to use. We recommend to use 3 threads in a machine with at least 100GB RAM.
```
OMP_NUM_THREADS=3 CUDA_VISIBLE_DEVICES=0 ./scripts/walker_walk/400/oracle/run_SeqRank.sh
```

To log, set wandb.use=false in run_SeqRank.sh.

To change the trajectory comparison method,
choose one from [pairwise, sequential, root].
```
python <train script>.py reward.model=pairwise ...
```

To change the username of wandb, change the wandb.username value in config/train_<>.yaml.

#### Eval SeqRank & Render Videos
```
OMP_NUM_THREADS=3 CUDA_VISIBLE_DEVICES=0 ./scripts/evaluation/run_seqrank.sh
```

### TroubleShooting
#### Rendering Error in Meta-World
If you find an opengl error while rendering in Meta-World, try the following command.
```
unset LD_PRELOAD
```