Document ID: CCS-BLS20 | Last Updated: Feb. 28, 2020
The Bureau of Labor Statistics (BLS) is known for its measurements of unemployment, inflation, and wages and compensation. One of its activities is to quantify and measure the millions of American workers that are injured on the job. The main method by which BLS collects the needed information on non-fatal injuries is through its Survey of Occupational Injuries and Illnesses (SOII), an annual survey of U.S. establishments that collects approximately 300,000 descriptions of these injuries each year. These descriptions are short text narratives written on Occupational Safety and Health Administration (OSHA) logs and Worker’s Compensation records. Historically BLS has relied entirely on staff to read and code these narratives. Collecting and analyzing the relevant information is challenging. Coding is labor and time intensive, and prone to human error. As a result BLS sought an automated, reliable approach to improve accuracy, coding speed, and overall efficiency. The BLS calls this automated solution “autocoding”.
In 2014, BLS began using machine learning to code a subset of cases using free and open-source software. By 2018 they were using neural network approaches for automatic coding across five primary coding tasks (occupation, nature, part, source, and event) with the model assigning approximately 81% of these codes. As of the date of this case study, 85% of the survey data is autocoded. Quality has improved significantly with coding accuracy as compared to human entry proving the value and efficacy of machine learning-based autocoding approaches.