TLA Text Corpus

Introduction

The text corpus is one of the two core parts of the Thesaurus Linguae Aegyptiae (TLA), the other one being the lemma lists. The corpus contains a constantly increasing number of Ancient Egyptian texts written in the hieroglyphic/hieratic or Demotic scripts, currently ranging from ca. 3,000 BCE to ca. 300 CE. (Coptic texts will be added later in the project.)

The Ancient Egyptian text world was a remarkable one, culturally and historically. Messages of varying length and complexity were written on immensely diverse objects from very different spheres of life. There are texts on portable objects such as papyrus, ostraca (i.e., stone flakes or potsherds), and (complete) vessels, as well as texts on immovable objects such as the walls of temples and tombs, obelisks, statues, etc. Such different kinds of support, their material, formal and functional features contribute additional significance, connotating the meaning of the written texts.

Given this close interrelation and semantic interaction between written texts and their material support, an enhanced understanding of the Ancient Egyptian world view via texts needs to systematically take textual as well as material features of text objects into account. Consequently, all texts in the text corpus are annotated with a wide array of metadata relating to the texts themselves as well as to their material support (German Textträger). It has always been a goal of the Academies’ project to develop a more or less balanced, diverse corpus, i.e., to provide a representative range of textual and chronological variation in the corpus. As of now, the text corpus comprises about 1.69 mill. lemma tokens (Hieroglyphic/hieratic: 1,355 thous., Demotic: 332 thous.).

Furthermore, the texts in the TLA are not primarily conceptualized as abstract texts (e.g., Sinuhe) but as a (semantically coherent) textual string on a concrete support (e.g., papyrus Berlin ÄM P 3022).

Details

Every written text and sentence as well as every text object (textträger) in the corpus has its own unique stable ID number, e.g., “MORHQGR3SNBI3KHAF6YOW5WLL4.” The basic level of a text is its Egyptological transliteration. A growing part of the subcorpus of hieroglyphic/hieratic texts is also annotated with a digital hieroglyphic transcription in JSesh-specific Manuel de Codage and, to the extent possible, Unicode. All texts also come with a translation into a modern language (mostly German, sometimes English or French, depending on the author’s language proficiencies). Texts may also contain commenting notes.

Text metadata and text object metadata

Texts and text objects systematically come with additional metadata which are not immanent in the text or text object itself. In order to enhance data retrieval, possible values of metadata are edited in controlled vocabularies (Thesauri). Categories of data and metadata relating to texts and text objects are shown in the following table:

Text data and metadata	Text object metadata
Egyptological transliteration
(Digital) hieroglyphic transcription
Translation (German, English, or French)
Script (Hieroglyphic, Hieratic, Demotic)
Language (phase) (Old Egyptian, Middle Egyptian, etc.)
Dating of the text witness	Dating of the text production
Text category/type	Type of text object
	Component
	Agent of a social action
	Material
	Dimension
	Condition
	Technique
	Archaeological Context
	Cultural Context
	Finding place
	Current location
Bibliographical references	Bibliographical references

“Texts” and “sub-texts” in the TLA

A ‘text’ in the broader sense as conceptualized in the TLA is an entity marked as an independent textual unit by clearly marked text delimiters (beginning and end). An individual text may either consist of writing only, or it may be a multimodal composition of writing and illustrations. An example of multimodal texts are offering scenes on the walls of Egyptian temples displaying the king vis-à-vis a deity, both interacting with each other. Written labels—short phrases or sentences—identify the depicted entities or give information about their interactions. Such short textual units, although distinct entities, are part of the larger unit of the scene and are therefore conceptualized not as independent “texts,” but as dependent “sub-texts” in the TLA. One characteristic of sub-texts vs. texts in the TLA is that a fixed reading sequence of sub-texts cannot normally be established. Another characteristic is that, when interpreting sub-texts, it is necessary to take accompanying sub-texts into account, e.g., a scene as a whole.

General principles of text editing in the TLA

As mentioned above, Egyptian texts in the TLA are primarily conceptualized as strings in Egyptological transliteration. Line/column counting generally follows the original source (tag “lc”, for ’line[/column] count’). Conventional line counts of standard publications of (abstract) texts can be referred to in addition (“para” tag). Texts are divided into units of simple or complex sentences. Each sentence has a unique stable ID number by which it should be quoted, e.g., “TLA sentence IBUBd1NUc4LHaUPIlW0V9mCZyNQ.”

Each word token (or sometimes sequence of words) is lemmatized, i.e., it is linked to an entry (’lemma’) in one of the TLA’s lemma lists. Moreover, the lemma tokens in many texts are also annotated with grammatical codes. These encode morphological inflection, mainly inflection that is overtly marked in script (e.g., genus, number of nouns), however sometimes also inflection that is covert in the purely consonantal script but which can be contextually reconstructed from syntax (e.g., genus verbi of an unmarked sḏm(=f), number of a relative sḏm.t.n(=f)). To keep grammatical annotation to a certain degree independent of continually debated theoretical premises, the tagging of tense/aspect/mood (TAM) features of inflected verbs is strictly limited to overt inflection, i.e., morphological features visible in the written form. For example, a morphologically unmarked nḥm(=f) is simply annotated as an instance of an (active or passive) “suffix conjugation” form without TAM specification. The lemma tokens of an increasing part of the texts also come with their original hieroglyphic spelling (or, in the case of hieratic, a hieroglyphic transcription). Authors are also encouraged to specify a particular sense of a lemma in context, either by picking one of a set of translations from the lemma list or by entering another specific sense themselves. In addition to these standard annotations, editors may add more annotations, such as other semantic features (e.g., type of speech act), layout features (e.g., rubra, verse points, split columns, lists), semantic features (e.g., metaphorical domains), etc.

The content of the text corpus

A complete overview of the TLA's text object tree can also be found here.

In the following you will find a manually compiled list, subdivided according to text dating. Sub-corpora with digital hieroglyphic transliteration are marked with [H], those with grammatical annotation with [G].

Texts from the Early Dynastic Period
- Royal and non-royal texts [H; G] (M. Rathenow, J. Schneider, G. Sperveslage)
Texts from the Old Kingdom
- Archival texts [G] (S. Grunert, I. Hafemann, S. Seidlmayer)
- Historical-biographical texts [partly H; G] (A. Burkhardt, R. Díaz Hernández, S. Grunert, J. Stauder-Porchet)
- Letters [G] (I. Hafemann)
- Letters to the Dead [G] (I. Hafemann)
- Non-royal tombs [partly H; G] (A. Burkhardt, S. Grunert, E. Windus-Staginsky)
  - Expanded v20: Qubbet el-Hawa [G] (R. Díaz Hernández)
- Rock inscriptions [G] (I. Hafemann, G. Sperveslage)
- Pyramid Texts [partly H; G] (D. Topmann)
- Votive labels [G] (S.J. Seidlmayer)
Texts from the First Intermediate Period
- Letters [G] (I. Hafemann)
- Letters to the Dead [G] (I. Hafemann)
- Pyramid Texts [partly H; G] (D. Topmann)
Texts from the Middle Kingdom
- Block statues [H] (R. Díaz Hernández)
- Historical and biographical texts of royal and non-royal persons [partly H; G] (M. Brose, P. Dils, R. Landgráfová, L. Popko, A. Schütze)
  - Heqaib sanctuary [G] (I. Hafemann)
- Letters [G] (I. Hafemann)
- Literary texts [partly H; G] (P. Dils, R. Enmarch, F. Feder, H. Felber, V. Lepper, L. Popko)
- Magical texts [H; G] (A. Blöbaum, P. Dils, L. Popko, K. Stegbauer)
- Medical texts [H; G] (P. Dils, I. Köhler, L. Popko, G. Sperveslage)
- Private stelae [H; G] (S. Beck)
- Religious texts: hymns [H; G] (P. Dils, A. Schütze)
- New v20: Rock inscriptions [G] (I. Hafemann, G. Sperveslage)
Texts from the Second Intermediate Period
- Letters [G] (I. Hafemann)
- Historical and biographical texts of royal and non-royal persons [partly H; G] (M. Brose, P. Dils, R. Landgráfová, L. Popko, A. Schütze)
- Literary texts [H; G] (P. Dils, L. Popko)
- Private Stelae [H; G] (S. Beck)
Texts from the New Kingdom
- Expanded v20: Archives: Ostraca of Senenmut [G] (A. Burkhardt & G. Sperveslage)
- Administrative texts from Deir el-Medina [H; G] (M. Goecke-Bauer; M. Landrino)
- Block statues [H] (R. Díaz Hernández)
- Book of the Dead [partly H] (B. Backes, J. Iskander)
- Letters [partly H; G] (I. Hafemann)
- Expanded v20: Literary texts [H; G] (M. Brose, P. Dils, F. Feder, H. Felber, H.-W. Fischer-Elfert, J. Jüngling, L. Popko)
- Expanded v20: Magical texts [H; G] (A. Blöbaum, M. Brose, P. Dils, L. Popko, J. Quack, K. Stegbauer)
- Medical texts [H; G] (A. Blöbaum, B. Böhm, M. Brose, C. Di Biase-Dyson, P. Dils, A. Herzberg, I. Köhler, L. Popko)
- Expanded v20: Netherworld Books [partly H; G] (E. Freier, D. Topmann, D.A. Werning)
- Private religious texts [H; G] (K. Dietze)
- Private stelae [H; G] (S. Beck)
- Expanded v20: Non-royal tombs [H; G] (A. Singer, P. Dils)
- Ritual of the Hours [H; G] (E. Graefe)
- Expanded v20: Historical-biographical texts of the 18th Dynasty up to Amenhotep III [partly H; G] (M. Brose, J. Iskander)
- New v20: Tomb robbery papyri [H; G] (B. Böhm)
- Expanded v20: Biographical texts from the Ramesside period [H; G] (P. Dils, E. Frood)
- Expanded v20: Graffiti and Dipinti [partly H; G] (H. Navratilova, U. Verhoeven)
- Expanded v20: Royal historical and rhetorical texts of the Ramesside Period [partly H; G] (S. Grallert, I. Hafemann, L. Popko, G. Sperveslage)
- Texts from the Amarna period [partly H; G] (D. Ceballos Contreras, I. Hafemann, A. Hornung, G. Sperveslage)
- New v20: Various texts on ostraca [G] (W. Reineke, G. Sperveslage)
Texts from the Third Intermediate Period
- Book of the Dead [partly H] (B. Backes, A. Wüthrich)
- Historical-biographical texts [H; G] (R. Díaz Hernández, S. Grallert, G. Sperveslage)
- Letters [H; G] (I. Hafemann)
- Literary texts [H; G] (P. Dils, L. Popko)
- Magical texts [H; G] (A. Blöbaum, M. Brose, P. Dils, L. Popko, K. Stegbauer)
- Ritual of the Hours [H; G] (E. Graefe)
- New v20: Royal tombs in Tanis [H; G] (D. Topmann)
- Texts from non-royal coffins [H; G] (J. Schneider)
Texts from the Late Period
- Administrative texts (G. Vittmann)
- Literary texts [partly H; G] (P. Dils, L. Popko)
- Medical texts [H; G] (A. Blöbaum, B. Böhm, M. Brose, P. Dils, F. Feder, L. Popko, K. Stegbauer)
- Expanded v20: Magical texts [H; G] (A. Blöbaum, B. Böhm, M. Brose, P. Dils, L. Popko, F. Langermann, J. Quack, K. Stegbauer)
- Historical-biographical texts
  - Expanded v20: 25th-26th Dynasties [H; G] (A. Blöbaum, A. El-Shiaty, R. Díaz Hernández, S. Grallert, J. Schneider, G. Sperveslage)
  - Expanded v20: 27th-29th. Dynasties [H; G] (R. Díaz Hernández, S. Grallert, G. Sperveslage)
  - Expanded v20: 30th Dynasty [H; G] (S. Grallert, A. Blöbaum, R. Birk, F. Hoffmann, L. Medini)
- Private stelae (G. Vittmann)
- Ritual of the Hours [H; G] (P. Dils, E. Graefe, K. Griffin)
- Expanded v20: Rock inscriptions (G. Vittmann, G. Sperveslage)
- Temple inscriptions [H; G] (S. Blaschta)
- Expanded v20: Texts from non-royal tombs [partly H; partly G] (A. Burkhardt, D. Topmann, G. Vittmann)
- Expanded v20: Texts from non-royal coffins/sarcophagi [partly H; partly G] (D. Topmann, M. Wagner, D.A. Werning)
- Texts from temple libraries [H; G] (F. Feder)
- Book of the Dead [H] (A. Wüthrich)
- Theological and religious texts (varia) [H; G] (N. Hartmann, D.A. Werning)
Texts from the Graeco-Roman Period
- Expanded v20: Administrative and documentary texts (G. Vittmann)
- Book of the Dead [H] (B. Backes, A. Wüthrich)
- Legal texts (G. Vittmann)
- Literary texts (G. Vittmann)
- Mathematical texts (G. Vittmann)
- Expanded v20: Magical texts [partly H; partly G] (A. Blöbaum, B. Böhm, M. Brose, P. Dils, A.-K. Gill, A. Pries, L. Popko, K. Stegbauer, G. Vittmann)
- Medical texts [partly H; partly G] (A. Blöbaum, B. Böhm, M. Brose, P. Dils, I. Köhler, L. Popko, G. Vittmann)
- Mortuary liturgies [partly H; G] (F. Feder, S. Töpfer)
- Expanded v20: Non-royal biographical texts [H; G] (R. Birk, P. Dils, D. Schäfer, J. Schneider, G. Sperveslage, G. Vittmann)
- Object inscriptions (G. Vittmann)
- Private stelae [partly H; partly G] (S. Beck, G. Vittmann)
- Religious texts (M. Moser, M. Stadler, G. Vittmann)
- Rock inscriptions (G. Vittmann)
- Royal texts [H; G] (R. Birk, D. Schäfer, J. Schneider, G. Sperveslage, G. Vittmann)
- Scientific texts (G. Vittmann)
- Temple inscriptions
  - Assuan, Bigge, Dakka, Deir el-Bahari, Deir el-Medina, Dendur, Opet [partly H; partly G] (P. Dils, M. Elebaut, A. Paulet, R. Preys)
  - New v20: Deir Chelouit [G] (Chr. Zivie-Coche)
  - Expanded v20: Dendara [partly H; partly G] (St. Baumann, P. Dils, A. Pries, A. Rickert, J. Tattko)
  - New v20: Edfu [partly H; G] (D. Budde, H. Wilde)
    - Edfu: Ritual of the Hours [H, G] (E. Graefe)
  - Esna (D. v. Recklinghausen)
- Texts from coffins/sarcophagi (G. Vittmann)
- Texts from temple libraries [H; G] (F. Feder)

List of TLA authors

For a complete list of authors, see here.

History of the hieroglyphic/hieratic text corpus

The digital text corpus of the TLA was initiated as part of the Academy’s previous project “Altägyptisches Wörterbuch” (AAeW, 1992–2012) at the Berlin-Brandenburg Academy of Sciences and Humanities (funded by the Academies’ program of the Union of the German Academies of Sciences and Humanities). The idea was to create a digital successor to A. Erman’s & H. Grapow’s Wörterbuch der aegyptischen Sprache (1926–1931; 1950, 1963), notably including the Belegstellen (1935–1953) volumes, in the age of corpus-based computational lexicography: (i) a lemmatized balanced digital corpus of Egyptian texts in hieroglyphic, hieratic, and Demotic script, which builds up (ii) a corpus-based ‘dictionary’ of the ancient Egyptian language.

In order to further complete the lemma list, additional texts were selected to be added into the TLA based on a set of criteria. Texts that had not been used for the original Wörterbuch project and texts that had been published or re-edited after the project had ended were favored for inclusion. Late Egyptian texts that were to be encoded in the Projet Ramsès (Liège), on the other hand, were disfavored. With the growth of the project team and increasing support from cooperating projects and individual researchers, a broader, more balanced, more diverse corpus is evolving.

Prospects for the future

Coptic text corpus

Coptic, the last phase of the ancient Egyptian language, is not yet represented in the TLA text corpus. Once the Coptic lemma list is implemented in the TLA, a sample corpus of texts from all Coptic dialects will be imported. This will come from the lemmatized digital text data generated by Wolf-Peter Funk over many decades. This legacy data was converted into a modern encoding format, i.e., Unicode, by Katrin John (cooperating project “Database and Dictionary of Greek Loanwords in Coptic,” FU Berlin) and will soon be processed for incorporation into the TLA.

Coffin texts

In collaboration with Wolfgang Schenkel, the project is preparing the transformation of his Coffin Text data (CTUrtext) so that the Coffin Texts can be incorporated into the TLA.