
CLEVER: A Curated Benchmark for Formally Verified Code Generation
Jul 8, 2025 · TL;DR: We introduce CLEVER, a hand-curated benchmark for verified code generation in Lean. It requires full formal specs and proofs. No few-shot method solves all stages, making it a …
Contrastive Learning Via Equivariant Representation - OpenReview
Sep 25, 2024 · In this paper, we revisit the roles of augmentation strategies and equivariance in improving CL's efficacy. We propose CLeVER (Contrastive Learning Via Equivariant …
We introduce CLEVER, the first curated benchmark for evaluating the generation of specifications and formally verified code in Lean. The benchmark comprises of 161 programming problems; it evaluates …
STAIR: Improving Safety Alignment with Introspective Reasoning
May 1, 2025 · One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can trick the AI into …
Submissions | OpenReview
Jan 22, 2025 · Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers Lorenzo Pacchiardi, Marko Tesic, Lucy G Cheke, Jose Hernandez-Orallo 27 Sept 2024 …
KnowTrace: Explicit Knowledge Tracing for Structured...
Sep 13, 2024 · TL;DR: We introduce a structured RAG paradigm (KnowTrace) that seamlessly integrates knowledge structuring and multi-step reasoning for improved MHQA performance.
LongWriter: Unleashing 10,000+ Word Generation from Long Context …
Jan 22, 2025 · The work includes a new benchmark (LongBench-Write) for evaluating ultra-long generation. Reviewers highlighted the paper's clear identification of the problem, the clever and …
In comparison, multi-view MPP is aimed at effectively integrating information from multiple views through clever design and strategy to capture a broader range of contextual information [13, 15, 36].
579 In this paper, we have proposed a novel counter- factual framework CLEVER for debiasing fact- checking models. Unlike existing works, CLEVER is augmentation-free and mitigates biases on infer- …
Reasoning of Large Language Models over Knowledge Graphs with...
Jan 22, 2025 · While large language models (LLMs) have made significant progress in processing and reasoning over knowledge graphs, current methods suffer from a high non-retrieval rate. This …