With the challenge concluded, all data sets are now available for download at the CMC Resource Catalog.
To challenge the international Natural Language Processing (NLP) research community to create and train computational intelligence algorithms that automate the assignment of ICD-9-CM codes to clinical free text.
It is surprisingly hard for computers to handle free text as smoothly and effectively as humans do. So far, the results of the numerous efforts to achieve this have been mixed. Indeed, at times it has appeared that the complexities of free text are such as to render the effort futile. Not so; in fact, successive attempts to address the problem of converting free text into actionable knowledge have advanced the science of natural language processing and led to demand for software that simulates and complements what people are able to do.
We are sponsoring an international challenge task on the automated processing of clinical free text. Even with advances in structured vocabularies, many hospitals continue to electronically store some patient data as free text. This practice produces terabytes of information that, beyond the clinical visit, has limited utility because of its volume and accessibility. Natural language processing can potentially uncover implicit structure in this data, rendering it accessible to targeted search engines as well as special purpose systems dedicated to billing, quality assurance and discovery. This challenge offers participants an opportunity to test their untested algorithms or apply existing ones. Additionally, the Challenge provides full access to a carefully anonymised body of clinical data suitable for training and testing.
All participants will be required to register. On 1 Feb 2007, participants will be given access to a training data set, which they will use to develop their algorithms.
The test data set will be made available on 1 Mar 2007. Participants will use their algorithms to process the test data, and will submit their results in XML format, along with a brief description of their methods. For complete details about the data formats and evaluation process, download the Challenge Details document.
Competition results will be announced on 1 Apr 2007 and posted to the results page.
The competition provides an international opportunity for research groups to share the applicability of their natural language processing and artificial intelligence research in the medical domain. Also, results will be published in some way, although the publication stream is not yet finalized. It will most likely include conference proceedings, journal publications and a potential book. All publications related to the challenge will include the appropriate participant(s) as co-authors.
If you have questions that are not answered on this web site or in the Challenge Details document, contact the Medical NLP Challenge administrator.