Wednesday, July 14, 2021

Text Analytics, Court Stats, and Privacy

 


A couple of weeks ago I shared some of “my problems with pending case statistics”.  Before that, I posted another note regarding an alternative for analyzing criminal justice data.  I generally try not to complain about things without having a solution in mind.  In this article, I will share the idea of using text analytics to work with a court’s largest data source, case documents, and reports.



 ---

One might say yes, it would be great if my computer to read and count things in the court documents? But it has generally been the reality that the court documents are not in a format that can be used by computers to read, categorize, and count things.  My hope was that E-filing could reduce this problem,  And it has. But now thanks to AI, there are other possible solutions?

First, I have been monitoring work being done by the “legal technology” on “e-discovery” systems as part of a possible solution.  With that, I recently stumbled across an article from a lawyer, Craig Ball who wrote about the “Google Pinpoint” service to possibly be used for “e-discovery”.   He wrote:

“(A) glimmer of hope crept over the transom today as I dragged and dropped a container file holding 50,000 e-mail messages into a free Google tool called Pinpoint.

Within minutes, Google converted the emails to PDFs and ran optical character recognition (OCR) against embedded imagery.  I quickly realized that Pinpoint hadn’t processed email attachments, so I grabbed the native attachments and pointed Pinpoint to them.  The attachments uploaded, images were OCR’ed and audio files were transcribed!  Even handwritten items were converted to searchable text!” 

What? WHAT!  We can get documents, typed and handwritten along with audio files transcribed?  That is a huge barrier that has been overcome not surprisingly as it is what Google does, consume data.

Microsoft of course also has a cool tool for this kind of transformation called Computer Vision - https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/   And some of you thought there wasn’t much benefit to AI?

Second, how can we count things in documents… it is called text analytics or text mining?  And unsurprising there has already been thought put to this approach in criminal justice case matters. 

A paper titled “Text Mining on Criminal Documents” was published in the International Journal of Advances in Electronics and Computer Science in 2016.  This paper describes the concepts and some examples as to how Text Mining can be used to count items (one could look at the case caption text for example), capture decision-related data. And I would also want to find and count relationship data between documents and different/similar cases.

Many court documents have formats such as the case title and case caption that alone can be used to identify and count case events and actions.  The court documents also include specific formats of statutory and case references used in legal research systems for literally decades.

Another thought leader in this space is Dr. Ewe Ewald, Director of the International Justice Analysis Forum.  Dr. Ewald has for example applied text analytics for cases at the UN Criminal Tribunal in The Hague. His current work uses the Provalis Research text analytics software from Canada.  While this is a proprietary software application, it provides a good example of what is possible.

Third, it is always helpful to count the same things in the same manner (at least for a specific jurisdiction or research program).  To that end, we have some thoughts from erudite Margaret Hagan of Stanford University.  She has written about the legal taxonomy they have been developing known as LIST.   Might we use this with text analytics/mining for privacy protection?  If we can find the data/text that should be protected that could help in this area.

She explains:

LIST is a taxonomy of legal issues, needs, and situations that people may face. It matches people’s life situations to standard legal terms and codes. Stanford Legal Design Lab maintains LIST.

LIST provides standard codes to use in your civic and legal technology projects. It also maps to other legal dictionaries and problem code taxonomies.

App & bot developers can use LIST codes to encode people’s inputs and their responses.

If you build bots, conversational agents, and other apps that go back-and-forth with users, then the LIST taxonomy codes can help you tag what people are asking for help with. And you can similarly encode the resources and links you’re offering to your users.”

Last, how does this subject relate to privacy?  The transformation and text mining of the documents will allow courts to find the documents and the data within to create systems for what, when, and how that information can be made accessible.  The Best Practices for Court Privacy PolicyFormation” report can provide guidance on how the text mining tools can potentially be configured and applied to the need.

In conclusion, we can get the data into a format that we can apply text mining tools.  This rich set of data goes far beyond what is possible in our court case management systems databases.  Therefore, we should add this to our toolboxes for legal, policy, and sociological analytics.

--

Notes:

A list of Text Analysis tools is available at: https://monkeylearn.com/blog/text-analysis-tools/

Prof. Erich Schweighofer of the University of Vienna has been thinking and writing about this for several decades.  Luckily much of his work is now available via Google Scholar at: https://scholar.google.com/citations?user=GuNftZsAAAAJ&hl=en


1 comment:

  1. Excellent information Jim. I hope these approaches can be embraced by courts and they can carve out a bit of R&D to mine such data (which leads to information, then knowledge, results and then review). Unfortunately for many in government though, we don't always find this leading/bleeding edge appetite to fund such positions.

    ReplyDelete