Financial Crime World

Model Performance Improves with Sample Data Set Redesign and Algorithm Adjustments

Detecting Financial Reporting Fraud in Listed Companies in China

A recent study on detecting financial reporting fraud in listed companies in China has yielded promising results after implementing sample data set redesign and algorithm adjustments.

Initial Model Performance


The initial study used the RUSBoost algorithm, a common ensemble learning algorithm for unbalanced samples. However, the model’s average AUC was found to be 0.60, significantly lower than that of a similar model proposed by previous research (AUC: 0.725). The sensitivity and precision rates were also below par.

Redesigning the Sample Data Set


To improve the model’s performance, the researchers redesigned the sample data set, ensuring it was representative of China’s capital market. This step is crucial in achieving more accurate results.

Algorithm Adjustments


The researchers also adjusted the algorithm to include stacking, which combines multiple weak classifiers to form a robust classifier. This change significantly improved the model’s performance.

Revised Model Performance


The revised model, based on the stacking algorithm, achieved an AUC of 0.742, outperforming other machine learning methods, including logistic regression and SVM. The sensitivity rate rose to 76.5%, indicating that the model correctly identified most fraudulent financial reports. The precision rate also improved to 76.5%, suggesting high judgment efficiency.

Stability Evaluation


The stacking algorithm’s stability was evaluated by repeating the selection process 20 times and calculating the standard deviation. The results showed that the stacking model exhibited better stability, maintaining its detection performance consistently across multiple tests.

Implications


These findings suggest that the financial reporting fraud detection model based on the stacking algorithm proposed in this study is effective for predicting whether financial reporting fraud has occurred in listed companies in China. The model can assist supervisors, investors, and others in identifying high-risk companies, reducing labor and time costs in preliminary investigations, and improving judgment efficiency.

The study’s implications highlight the importance of data set redesign and algorithm adjustments in improving machine learning models’ performance. As the Chinese capital market continues to evolve, the development of more effective financial reporting fraud detection models will be crucial for maintaining market stability and investor confidence.