Benchmarks Say Claude 3 is Better than GPT-4, But is It?