Using LLMs to Extract Data from IPNO Wrongful Conviction Files

Using LLMs to Extract Data from IPNO Wrongful Conviction Files

HRDAG has published a tech note outlining a collaborative research effort between the Innocence Project New Orleans and the Human Rights Data Analysis Group to make unstructured wrongful conviction case documents more searchable and useful by extracting structured information about law enforcement personnel and their roles. Because these exoneration records are voluminous and unstructured, the team developed a multi-stage process involving metadata compilation, document classification, and data extraction, comparing traditional methods like regular expressions with advanced techniques using large language models (LLMs) and similarity search tools. Their workflow leverages tools such as LangChain, FAISS, and GPT-based models to identify relevant text chunks and extract names, ranks, and roles of officers mentioned across documents, with evaluation metrics guiding optimization, and cross-referencing results with the Louisiana Law Enforcement Accountability Database to uncover broader patterns in police involvement in wrongful convictions.

RELATED NEWS & EVENTS