Publications - Working papers

Please find below working papers of our group. Currently, we list 42 working papers. In the list are only not published papers present. If you look for a preprint of an already published paper you must look in the "Published papers" section. If you have problems accessing electronic information, please let us know:

©NOTICE: All working papers are copyrighted by the authors; If you would like to use all or a portion of any paper, please contact the author.

Does a Phylogeny of Topics Recapitulate the History of Ideas and Institutions?

Retzlaff, Nancy and Niekler, Andreas and Heyer, Gerhard and Kleine, Christoph and Stadler, Peter F.

Download



Abstract


Computational workflows have been devised in a variety of research areas in the the humanities, in particular linguistics and historical sciences, to make use of the rapidly increasing amount of data that have become available in machine-readable form. Here we use the Chinese Electronic Tripitaka Collection, a digitized collection of 2500 years of Buddhist canonical texts, to ask whether historical relationships are reflected in the texts in such a way that they can be reconstructed using methods adapted from phylogenetics. More specifically, we ask whether the presence and abundance of high level concepts in the writings of Buddhist schools of thought behave akin to characters in biological evolution and thus make it possible to infer their relationships of descent from this type of data alone. We use Topic Modelling to describe the contents of documents in an unsupervised and unbiased manner. To this end, we first had to train the Stanford Word Segmenter and POS-Tagger for use on the Chinese Electronic Tripitaka Collection, which does not conform to available models for Standard Chinese. The annotation of single words, word types and a parse tree as well as the topic annotation of entire corpus constitute a unique resource that we make available as an intermediate result of the project.