RESOURCE> Updates to Corpora for use with the TACL GUI

Michael Radich's picture

Dear colleagues,

In August last year, I wrote to this list announcing the release of the TACL GUI.

For use in conjunction with the GUI, I also released via Zenodo two corpora, and a database for one of the corpora. The corpora were (1) a "plain vanilla" Taishō-Xuzangjing corpus, as digitised by CBETA; and (2) a modified version of the Taishō only, which I had adjusted in various ways to reflect my understanding of the current state of scholarship on various facts in textual history. The database was a full TACL database for use with the second of these two corpora.

We subsequently discovered certain faults in the way our TACL code was handling the CBETA XML representation of the Taishō critical apparatus (footnotes), which produced some inaccuracies in the way TACL reconstructed the various witnesses for the texts (such as Song, Yuan, Ming, etc.). Those problems, and the solutions we applied to correct for them, are described in this document

Readers should note that most texts in the corpora were unaffected by the problems at issue, and usually, those texts that were affected were affected in a relatively minor manner. Usually, then, TACL tests using the old corpora will have been largely accurate (or often completely so). Further, philologically rigorous work based on TACL results should ordinarily not depend blindly on raw TACL results. As far as I can determine, therefore, the arguments in prior publications based on TACL analysis (mainly my own publications) are not undermined by the discovery of this technical problem. 

We have now released updated versions of both corpora, and the database, which correct for those problems. Those updates can be accessed here:

I urge all users of TACL, including users of the TACL GUI, to update their corpora and database. I apologise for any inconvenience that might be caused by the fact that the earlier corpora were somewhat inaccurate, and the necessity now to implement these updates. Any users who wish to discuss the exact nature of the problems, and the extent to which their own research exercises might have been impacted by them, are most welcome to contact me offlist.

As always, the TACL GUI and related resources are available, and explained in more detail, on the relevant page at dazangthings.


Michael Radich (Heidelberg University)