Personal details
| Title | Benchmarking jailbreaking for GenAI |
| Description | This Bachelor’s thesis benchmarks jailbreaking attacks against Generative AI systems across LLMs, VLMs, and (optionally) VLAs used in robotics. The student will (i) review and categorize jailbreak techniques across these modalities, (ii) design a reproducible benchmark with realistic constraints (e.g., short prompts, limited retries such as ≤3 attempts before an “alert”), (iii) define practical KPIs (success rate under constraints, trials-to-success, severity/impact), and (iv) run experiments to report results and key failure modes. Top grades are awarded for a clear research contribution: a novel jailbreak method, a novel metric, or results that meaningfully extend existing knowledge. |
| Home institution | Department of Computing Science |
| Associated institutions |
|
| Type of work | practical / application-focused |
| Type of thesis | Bachelor's |
| Author | Prof. Dr. Chih-Hong Cheng |
| Status | reserved |
| Problem statement | Current jailbreak evaluations are often not comparable, not deployment-realistic, and largely text-centric, whereas modern GenAI systems increasingly integrate language, vision, tools, and actions. As a result, it is unclear how vulnerable LLM/VLM/VLA systems are under operational constraints (short interactions, limited trials, monitoring/alerts), and which attack families generalize across modalities. This thesis addresses that gap by creating a benchmark and KPIs that capture not only “does it jailbreak,” but “how quickly, under what constraints, and with what practical risk,” and then using it to produce actionable findings for safety evaluation. |
| Requirement | Students should be comfortable with Python and empirical experimentation (running models, logging, basic statistics), and able to write a structured literature review and a clear technical report. They should have an ambitious research mindset: proactively exploring ideas, iterating on experimental design, being rigorous about reproducibility, and being willing to pursue a novel contribution (new attack, new metric, or strong comparative results). Familiarity with ML/LLMs and basic security/safety concepts is a plus; experience with multimodal models or robotics is beneficial but not required. |
| Created | 04/03/26 |