Are you over 18 and want to see adult content?
More Annotations
A complete backup of pictureshowent.com
Are you over 18 and want to see adult content?
A complete backup of restoran-service.ru
Are you over 18 and want to see adult content?
A complete backup of fisip-unmul.ac.id
Are you over 18 and want to see adult content?
A complete backup of thetruthaboutvaccines.com
Are you over 18 and want to see adult content?
Favourite Annotations
A complete backup of www.firstpost.com/entertainment/bheeshma-movie-review-nithiin-rashmika-mandannas-romantic-drama-is-a-breezy
Are you over 18 and want to see adult content?
A complete backup of internacional.estadao.com.br/noticias/geral
Are you over 18 and want to see adult content?
A complete backup of www.sport24.gr/football/spain/primeradivision/levante-real-1-0-gkremise-toys-merengkes-apo-thn-koryfh-me-gk
Are you over 18 and want to see adult content?
A complete backup of www.espn.com/soccer/english-premier-league/story/4057444/chelsea-tottenham-var-farce-over-lo-celsos-clash-w
Are you over 18 and want to see adult content?
A complete backup of www.bavarianfootballworks.com/2020/2/23/21148790/joshua-kimmich-lauds-brave-paderborn-performance-bundeslig
Are you over 18 and want to see adult content?
A complete backup of www.bbc.co.uk/sport/football/51598759
Are you over 18 and want to see adult content?
Text
taxonomy:
NSUBJ - UNIVERSAL DEPENDENCIES nsubj: nominal subject. A nominal subject (nsubj) is a nominal which is the syntactic subject and the proto-agent of a clause.That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive, the do-er, or the proto-agent of the clause. ADVCL - UNIVERSAL DEPENDENCIES advcl: adverbial clause modifier. An adverbial clause modifier is a clause which modifies a verb or other predicate (adjective, etc.), as a modifier not as a core complement. This includes things such as a temporal clause, consequence, conditional clause, purpose clause, etc. The dependent must be clausal (or else it is an advmod) and theUNIVERSAL POS TAGS
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, use the universal features. Open class words.Closed class words.
CSUBJ - UNIVERSAL DEPENDENCIES csubj: clausal subject. A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of thecopular verb.
ROOT - UNIVERSAL DEPENDENCIES root: root. root. : root. The root grammatical relation points to the root of the sentence. A fake node ROOT is used as the governor. The ROOT node is indexed with 0, since the indexing of real words in the sentence starts at 1. (The ROOT node is not represented explicitly inCoNLL-U.) edit.
POLITE - UNIVERSAL DEPENDENCIES Polite. : politeness. Various languages have various means to express politeness or respect; some of the means are morphological. Three to four dimensions of politeness are distinguished in linguistic literature. The Polite feature currently covers (and mixes) two of them; a more elaborate system of feature values may be devised infuture
PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, a clause after a “:” or a “;”, or two sentences placed side by side without any explicit coordination or subordination.CONLL-U VIEWER
CoNLL-U File. Load CoNLL-U File UNIVERSAL DEPENDENCIESIRISH UDLANGUAGE DOCUMENTATION PAGEFRENCH UDSPANISH UDCLASSICAL CHINESE UD Universal Dependencies. Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages. UNIVERSAL DEPENDENCIES Universal Dependencies. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UDtaxonomy:
NSUBJ - UNIVERSAL DEPENDENCIES nsubj: nominal subject. A nominal subject (nsubj) is a nominal which is the syntactic subject and the proto-agent of a clause.That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive, the do-er, or the proto-agent of the clause. ADVCL - UNIVERSAL DEPENDENCIES advcl: adverbial clause modifier. An adverbial clause modifier is a clause which modifies a verb or other predicate (adjective, etc.), as a modifier not as a core complement. This includes things such as a temporal clause, consequence, conditional clause, purpose clause, etc. The dependent must be clausal (or else it is an advmod) and theUNIVERSAL POS TAGS
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, use the universal features. Open class words.Closed class words.
CSUBJ - UNIVERSAL DEPENDENCIES csubj: clausal subject. A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of thecopular verb.
ROOT - UNIVERSAL DEPENDENCIES root: root. root. : root. The root grammatical relation points to the root of the sentence. A fake node ROOT is used as the governor. The ROOT node is indexed with 0, since the indexing of real words in the sentence starts at 1. (The ROOT node is not represented explicitly inCoNLL-U.) edit.
POLITE - UNIVERSAL DEPENDENCIES Polite. : politeness. Various languages have various means to express politeness or respect; some of the means are morphological. Three to four dimensions of politeness are distinguished in linguistic literature. The Polite feature currently covers (and mixes) two of them; a more elaborate system of feature values may be devised infuture
PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, a clause after a “:” or a “;”, or two sentences placed side by side without any explicit coordination or subordination.CONLL-U VIEWER
CoNLL-U File. Load CoNLL-U File UNIVERSAL DEPENDENCIES UD Guidelines. Basic principles Tokenization and word segmentation; Morphology; Syntax; Enhanced dependencies; CoNLL-U format and its extensions; Typos and other errors in underlying textCONLL-U FORMAT
CoNLL-U Format. Quick links: We use a revised version of the CoNLL-X format called CoNLL-U. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:. Word lines containing the annotation of a word/token in 10 fields FORMAT - UNIVERSAL DEPENDENCIES CoNLL-U Format. We use a revised version of the CoNLL-X format called CoNLL-U. Annotations are encoded in plain text files (UTF-8, using only the LF character as line break) with three types of lines:. Word lines containing the annotation of a word/token in OBJ - UNIVERSAL DEPENDENCIES obj: object. The object of a verb is the second most core argument of a verb after the subject. Typically, it is the noun phrase that denotes the entity acted upon or which undergoes a change of state or motion (the proto-patient). In languages distinguishing morphological cases, the object will often be marked by the accusative case. However PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, a clause after a “:” or a “;”, or two sentences placed side by side without any explicit coordination or subordination. POS TAGS - UNIVERSAL DEPENDENCIES Possessive marker: ’s or ’ (and non-standard forms s, -s) Predicate negation: not, n’t, nt. Infinitive marker: to (and non-standard forms ta, na, too, ot, 2, a) (This is a slightly motley list and we may still want to rethink this category for English.) This covers PTB tags POS and some (old PTB style) or all uses of TO, andthe subset
STATISTICS OF ADV IN UD_ROMANIAN-ART Treebank Statistics: UD_Romanian-ArT: POS Tags: ADV There are 35 ADV lemmas (15%), 36 ADV types (11%) and 55 ADV tokens (10%). Out of 14 observed tags, the rank of ADV is: 3 in number of lemmas, 4 in number of types and 5 in number of tokens.. The 10 most frequent ADV lemmas: iu, cum, tora, tut, cama, dip, diznou, ţe, acasă, acolo. The 10 most frequent ADV types: iu, cum, tora, tut, cama STATISTICS OF MOOD IN UD_TURKISH-TOURISM Treebank Statistics: UD_Turkish-Tourism: Features: Mood This feature is universal. It occurs with 6 different values: Cnd, Des, Imp, Ind, Nec, Opt. 16569 tokens (18%) have a non-empty value of Mood. 1123 types (22%) occur at least once with a non-empty value of Mood. 475 lemmas (21%) occur at least once with a non-empty value of Mood.The feature is used with 2 part-of-speech tags: VERB (15473 STATISTICS OF POSITION IN UD_ROMANIAN-ART Treebank Statistics: UD_Romanian-ArT: Features: Position This feature is language-specific. It occurs with 1 different values: Prenom. 2 tokens (0%) have a non-empty value of Position. 2 types (1%) occur at least once with a non-empty value of Position. 2 lemmas (1%) occur at least once with a non-empty value of Position.The feature is used with 1 part-of-speech tags: DET (2; 0% instances). STATISTICS OF NUMBER IN UD_ROMANIAN-ART Treebank Statistics: UD_Romanian-ArT: Features: Number This feature is language-specific. It occurs with 1 different values: Sing. This is a layered feature with the following layers: Number, Number.. 1 tokens (0%) have a non-empty value of Number. 1 types (0%) occur at least once with a non-empty value of Number. 1 lemmas (0%) occur at least once with a non-empty value UNIVERSAL DEPENDENCIESIRISH UDLANGUAGE DOCUMENTATION PAGEFRENCH UDSPANISH UDCLASSICAL CHINESE UD Universal Dependencies. Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages.UD FOR ENGLISH
UNIVERSAL DEPENDENCIES Universal Dependencies. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UDtaxonomy:
UNIVERSAL DEPENDENCIES UD Guidelines. Basic principles Tokenization and word segmentation; Morphology; Syntax; Enhanced dependencies; CoNLL-U format and its extensions; Typos and other errors in underlying textCONLL-U FORMAT
NSUBJ - UNIVERSAL DEPENDENCIES nsubj: nominal subject. A nominal subject (nsubj) is a nominal which is the syntactic subject and the proto-agent of a clause.That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive, the do-er, or the proto-agent of the clause.AKKADIAN UD
UD for Akkadian Tokenization and Word Segmentation. In RIAO, sentence boundaries were arrived at by syntactically annotating the unsegmented corpus, and identifying words that are head words but are not themselves dependents of other words.The separate trees produced this way were considered to be separate sentences. Words are only exceptionally delimited by whitespace or punctuation in theUNIVERSAL POS TAGS
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, use the universal features. Open class words.Closed class words.
CSUBJ - UNIVERSAL DEPENDENCIES csubj: clausal subject. A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of thecopular verb.
CONLL-U VIEWER
CoNLL-U File. Load CoNLL-U File UNIVERSAL DEPENDENCIESIRISH UDLANGUAGE DOCUMENTATION PAGEFRENCH UDSPANISH UDCLASSICAL CHINESE UD Universal Dependencies. Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages.UD FOR ENGLISH
UNIVERSAL DEPENDENCIES Universal Dependencies. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UDtaxonomy:
UNIVERSAL DEPENDENCIES UD Guidelines. Basic principles Tokenization and word segmentation; Morphology; Syntax; Enhanced dependencies; CoNLL-U format and its extensions; Typos and other errors in underlying textCONLL-U FORMAT
NSUBJ - UNIVERSAL DEPENDENCIES nsubj: nominal subject. A nominal subject (nsubj) is a nominal which is the syntactic subject and the proto-agent of a clause.That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive, the do-er, or the proto-agent of the clause.AKKADIAN UD
UD for Akkadian Tokenization and Word Segmentation. In RIAO, sentence boundaries were arrived at by syntactically annotating the unsegmented corpus, and identifying words that are head words but are not themselves dependents of other words.The separate trees produced this way were considered to be separate sentences. Words are only exceptionally delimited by whitespace or punctuation in theUNIVERSAL POS TAGS
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, use the universal features. Open class words.Closed class words.
CSUBJ - UNIVERSAL DEPENDENCIES csubj: clausal subject. A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of thecopular verb.
CONLL-U VIEWER
CoNLL-U File. Load CoNLL-U FileUD FOR ENGLISH
UD for English . UD English contains data from multiple treebanks created by different teams at different times and with often different conversion tools (from gold constituent treebanks, such as the English Web Treebank for English-EWT, or from different gold dependency treeebanks, such as English-GUM). UNIVERSAL DEPENDENCIES Universal Dependencies. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UDtaxonomy:
UNIVERSAL DEPENDENCY RELATIONS Universal Dependency Relations. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UD taxonomy such that rows correspond to functional SYNTAX - UNIVERSAL DEPENDENCIES Syntax: General Principles. Syntactic annotation in the UD scheme consists of typed dependency relations between words. The basic dependency representation forms a tree, where exactly one word is the head of the sentence, dependent on a notional ROOT and all other words are dependent on another word in the sentence, as exemplified below (where we explicitly represent the root dependency whichUNIVERSAL POS TAGS
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, use the universal features. Open class words.Closed class words.
OBJ - UNIVERSAL DEPENDENCIES obj: object. The object of a verb is the second most core argument of a verb after the subject. Typically, it is the noun phrase that denotes the entity acted upon or which undergoes a change of state or motion (the proto-patient). In languages distinguishing morphological cases, the object will often be marked by the accusative case. However STATISTICS OF ADV IN UD_ROMANIAN-ART Treebank Statistics: UD_Romanian-ArT: POS Tags: ADV There are 35 ADV lemmas (15%), 36 ADV types (11%) and 55 ADV tokens (10%). Out of 14 observed tags, the rank of ADV is: 3 in number of lemmas, 4 in number of types and 5 in number of tokens.. The 10 most frequent ADV lemmas: iu, cum, tora, tut, cama, dip, diznou, ţe, acasă, acolo. The 10 most frequent ADV types: iu, cum, tora, tut, cama ADVCL - UNIVERSAL DEPENDENCIES advcl: adverbial clause modifier. An adverbial clause modifier is a clause which modifies a verb or other predicate (adjective, etc.), as a modifier not as a core complement. This includes things such as a temporal clause, consequence, conditional clause, purpose clause, etc. The dependent must be clausal (or else it is an advmod) and the STATISTICS OF POSITION IN UD_ROMANIAN-ART Treebank Statistics: UD_Romanian-ArT: Features: Position This feature is language-specific. It occurs with 1 different values: Prenom. 2 tokens (0%) have a non-empty value of Position. 2 types (1%) occur at least once with a non-empty value of Position. 2 lemmas (1%) occur at least once with a non-empty value of Position.The feature is used with 1 part-of-speech tags: DET (2; 0% instances). STATISTICS OF NUMBER IN UD_ROMANIAN-ART Treebank Statistics: UD_Romanian-ArT: Features: Number This feature is language-specific. It occurs with 1 different values: Sing. This is a layered feature with the following layers: Number, Number.. 1 tokens (0%) have a non-empty value of Number. 1 types (0%) occur at least once with a non-empty value of Number. 1 lemmas (0%) occur at least once with a non-empty value UNIVERSAL DEPENDENCIESIRISH UDLANGUAGE DOCUMENTATION PAGEFRENCH UDSPANISH UDCLASSICAL CHINESE UD Universal Dependencies. Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages. UNIVERSAL DEPENDENCIES Universal Dependencies. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UDtaxonomy:
NSUBJ - UNIVERSAL DEPENDENCIES nsubj: nominal subject. A nominal subject (nsubj) is a nominal which is the syntactic subject and the proto-agent of a clause.That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive, the do-er, or the proto-agent of the clause. PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, a clause after a “:” or a “;”, or two sentences placed side by side without any explicit coordination or subordination. ADVCL - UNIVERSAL DEPENDENCIES advcl: adverbial clause modifier. An adverbial clause modifier is a clause which modifies a verb or other predicate (adjective, etc.), as a modifier not as a core complement. This includes things such as a temporal clause, consequence, conditional clause, purpose clause, etc. The dependent must be clausal (or else it is an advmod) and the CSUBJ - UNIVERSAL DEPENDENCIESCLAUSAL SUBJECTCLAUSAL COMPLEMENTCLAUSAL OPENERS LISTCLAUSAL SENTENCE OPENERSCLAUSAL STARTERCLAUSAL WORDS LIST csubj: clausal subject. A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of thecopular verb.
PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between a word (often the main predicate of a sentence) and other elements, such as a sentential parenthetical or a clause after a “:” or a “;”, placed side by side without any explicit coordination, subordination, or argument relation with the head word. XCOMP - UNIVERSAL DEPENDENCIES xcomp: open clausal complement. An open clausal complement (xcomp) of a verb or an adjective is a predicative or clausal complement without its own subject.The reference of the subject is necessarily determined by an argument external to the xcomp (normally by the object of the next higher clause, if there is one, or else by the subject of thenext higher clause.
SUD OR SURFACE-SYNTACTIC UNIVERSAL DEPENDENCIES be established by means of criteria of type B determining who, to or Mary, is the head of to Mary.At this point, UD parts with surface syntax criteria and applies the criterion of “content wordCONLL-U VIEWER
CoNLL-U File. Load CoNLL-U File UNIVERSAL DEPENDENCIESIRISH UDLANGUAGE DOCUMENTATION PAGEFRENCH UDSPANISH UDCLASSICAL CHINESE UD Universal Dependencies. Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing nearly 200 treebanks in over 100 languages. UNIVERSAL DEPENDENCIES Universal Dependencies. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UDtaxonomy:
NSUBJ - UNIVERSAL DEPENDENCIES nsubj: nominal subject. A nominal subject (nsubj) is a nominal which is the syntactic subject and the proto-agent of a clause.That is, it is in the position that passes typical grammatical test for subjecthood, and this argument is the more agentive, the do-er, or the proto-agent of the clause. PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, a clause after a “:” or a “;”, or two sentences placed side by side without any explicit coordination or subordination. ADVCL - UNIVERSAL DEPENDENCIES advcl: adverbial clause modifier. An adverbial clause modifier is a clause which modifies a verb or other predicate (adjective, etc.), as a modifier not as a core complement. This includes things such as a temporal clause, consequence, conditional clause, purpose clause, etc. The dependent must be clausal (or else it is an advmod) and the CSUBJ - UNIVERSAL DEPENDENCIESCLAUSAL SUBJECTCLAUSAL COMPLEMENTCLAUSAL OPENERS LISTCLAUSAL SENTENCE OPENERSCLAUSAL STARTERCLAUSAL WORDS LIST csubj: clausal subject. A clausal subject is a clausal syntactic subject of a clause, i.e., the subject is itself a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of thecopular verb.
PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between a word (often the main predicate of a sentence) and other elements, such as a sentential parenthetical or a clause after a “:” or a “;”, placed side by side without any explicit coordination, subordination, or argument relation with the head word. XCOMP - UNIVERSAL DEPENDENCIES xcomp: open clausal complement. An open clausal complement (xcomp) of a verb or an adjective is a predicative or clausal complement without its own subject.The reference of the subject is necessarily determined by an argument external to the xcomp (normally by the object of the next higher clause, if there is one, or else by the subject of thenext higher clause.
SUD OR SURFACE-SYNTACTIC UNIVERSAL DEPENDENCIES be established by means of criteria of type B determining who, to or Mary, is the head of to Mary.At this point, UD parts with surface syntax criteria and applies the criterion of “content wordCONLL-U VIEWER
CoNLL-U File. Load CoNLL-U File UNIVERSAL DEPENDENCY RELATIONS Universal Dependency Relations. The following table lists the 37 universal syntactic relations used in UD v2. It is a revised version of the relations originally described in Universal Stanford Dependencies: A cross-linguistic typology (de Marneffe et al. 2014).. The upper part of the table follows the main organizing principles of the UD taxonomy such that rows correspond to functional UNIVERSAL DEPENDENCIES UD Guidelines. Basic principles Tokenization and word segmentation; Morphology; Syntax; Enhanced dependencies; CoNLL-U format and its extensions; Typos and other errors in underlying textCONLL-U FORMAT
CoNLL-U Format. Quick links: We use a revised version of the CoNLL-X format called CoNLL-U. Annotations are encoded in plain text files (UTF-8, normalized to NFC, using only the LF character as line break, including an LF character at the end of file) with three types of lines:. Word lines containing the annotation of a word/token in 10 fieldsCONLL-U FORMAT
CoNLL-U Format. We use a revised version of the CoNLL-X format called CoNLL-U. Annotations are encoded in plain text files (UTF-8, using only the LF character as line break) with three types of lines:. Word lines containing the annotation of a word/token inTENSE : TENSE
The only Tense. Fut: Future. Usually realised in adverbs. Examples. oho putar “He/she/they will go”; Past is not a feature of the verb in Guajajara. Some evidential partiles also carry tense, as kakwez, which always appears in second position and inindicates an attested event in a distant past.. Past: Past. Usually realised in adverbs.Examples
UNIVERSAL POS TAGS
Universal POS tags. These tags mark the core part-of-speech categories. To distinguish additional lexical and grammatical properties of words, use the universal features. Open class words.Closed class words.
PARATAXIS : PARATAXIS parataxis: parataxis. The parataxis relation (from Greek for “place side by side”) is a relation between a word (often the main predicate of a sentence) and other elements, such as a sentential parenthetical or a clause after a “:” or a “;”, placed side by side without any explicit coordination, subordination, or argument relation with the head word. STATISTICS OF PART IN UD_KANGRI-KDTBTRANSLATE THIS PAGE Treebank Statistics: UD_Kangri-KDTB: POS Tags: PART There are 1 PART lemmas (7%), 17 PART types (1%) and 95 PART tokens (4%). Out of 15 observed tags, the rank of PART is: 10 in number of lemmas, 11 in number of types and 8 in number of tokens.. The 10 most frequent PART lemmas: _. The 10 most frequent PART types: ही, भी, क्या, तां, नी, कोई, लगभग STATISTICS OF MARK IN UD_KANGRI-KDTB Treebank Statistics: UD_Kangri-KDTB: Relations: mark This relation is universal. 30 nodes (1%) are attached to their parents as mark.. 16 instances of mark (53%) are right-to-left (child precedes parent). Average distance between parent and child is 2.9. STATISTICS OF FLAT:RANGE IN UD_WESTERN_ARMENIAN-ARMTDP Treebank Statistics: UD_Western_Armenian-ArmTDP: Relations: flat:range This relation is a language-specific subtype of flat.There are also 2 other language-specific subtypes of flat: flat:dist, flat:name.. 7 nodes (0%) are attached to their parents as flat:range.. 7 instances of flat:range (100%) are left-to-right (parent precedes child). Average distance between parent and child is 1home edit page
issue tracker
×
search
Custom Search
Sort by:
Relevance
Relevance
Date
------------------------- This page pertains to UD version 2. It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations. UNIVERSAL DEPENDENCIES Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing more than 150 treebanks in 90 languages. If you’re new to UD, you should start by reading the first part of the Short Introduction and then browsing the annotation guidelines. * Short introduction to UD * UD annotation guidelines * More information on UD: * How to contribute to UD * Tools for working with UD* Discussion on UD
* UD-related events
* Query UD treebanks online: * SETS treebank searchmaintained by the
University of Turku
* PML Tree Query
maintained by the Charles University in Prague* Kontext
maintained by the Charles University in Prague * Grew-match maintained by Inria in Nancy * INESS maintained by the Universityof Bergen
* Download UD treebanks If you want to receive news about Universal Dependencies, you can subscribe to the UD mailing list . If you want to discuss individual annotation questions, use the Github issue tracker.
CURRENT UD LANGUAGES Information about language families (and genera for families with multiple branches) is mostly taken from WALS Online (IE = Indo-European). Abaza 1 <1K __ Northwest CaucasianABAZA TREEBANKS
ATB <1K __
UD_Abaza-ATB is a treebank based on (https://linghub.ru/spoken_abaza/). * Contributors: Alexey Koshevoy* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Afrikaans 1 49K ____ IE, GermanicAFRIKAANS TREEBANKS
AfriBooms 49K ____
UD Afrikaans-AfriBooms is a conversion of the AfriBooms Dependency Treebank, originally annotated with a simplified PoS set and dependency relations according to a subset of the Stanford tag set. The corpus consists of public government documents. * Contributors: Peter Dirix, Liesbeth Augustinus, Daniel van Niekerk* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Akkadian 2 23K ______ Afro-Asiatic, SemiticAKKADIAN TREEBANKS
RIAO 21K ____
162 royal inscriptions of four early Neo-Assyrian kings. * Contributors: Mikko Luukko, Aleksi Sahala, Sam Hardwick, KristerLindén
* Repository master
dev
* README
* Treebank hub page
* Download
PISANDUB 1K __
A small set of sentences from Babylonian royal inscriptions. * Contributors: Kamil Kopacewicz* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Akkadian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Akuntsu 1 <1K ____ Tupian, TupariAKUNTSU TREEBANKS
TuDeT <1K ____
UD_Akuntsu-TuDeT is a collection of annotated texts in Akuntsú.Together with
(https://github.com/UniversalDependencies/UD_Tupinamba-TuDeT) and UD_Munduruku-TuDeT, UD_Akuntsu-TuDeT is part of the (https://tular.org) project. * Contributors: Carolina Aragon, Fabrício Ferraz Gerardi* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Albanian 1 <1K __ IE, AlbanianALBANIAN TREEBANKS
TSA <1K __
The UD Albanian Treebank is a small treebank for Standard Albanian, developed within a project framework at Uppsala University. The data was extracted from Wikipedia. * Contributors: Marsida Toska* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Amharic 1 10K __________ Afro-Asiatic, SemiticAMHARIC TREEBANKS
ATT 10K __________
UD_Amharic-ATT is a manual developed Treebanks for Amharic. Sentences were collected from grammar books, fictions, biographies, religioustexts and news.
* Contributors: Binyam Ephrem, Gashaw Arutie, Tsegay Woldemariam, Juan Ignacio Navarro Horñiacek* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Ancient Greek 2 416K ______ IE, Greek ANCIENT GREEK TREEBANKSPROIEL 213K ____
UD_Ancient_Greek-PROIEL is converted from the Ancient Greek data in the PROIEL treebank, and consists of the New Testament plus selectionsfrom Herodotus.
* Contributors: Dag Haug* Repository master
dev
* README
* Treebank hub page
* Download
Perseus 202K __
This Universal Dependencies Ancient Greek Treebank consists of an automatic conversion of a selection of passages from the Ancient Greek and Latin Dependency Treebank 2.1 * Contributors: Giuseppe G. A. Celano, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Ancient Greek treebanks. LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Apurina 1 <1K ____ ArawakanAPURINA TREEBANKS
UFPA <1K ____
This is an Apurinã treebank consisting of sentences from a grammatical description of the language by Maília Fernanda. * Contributors: Marília Fernanda, Sidney Facundes, Jack Rueter,Niko Partanen
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Arabic 3 1,042K ____ Afro-Asiatic, SemiticARABIC TREEBANKS
PADT 282K __
The Arabic-PADT UD treebank is based on the (http://ufal.mff.cuni.cz/padt/) (PADT), created at the Charles University in Prague. * Contributors: Daniel Zeman, Zdeněk Žabokrtský, Shadi Saleh* Repository master
dev
* README
* Treebank hub page
* Download
PUD 20K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Luma Ateyah, Martin Popel, Daniel Zeman, NizarHabash, Dima Taji
* Repository master
dev
* README
* Treebank hub page
* Download
NYUAD 738K __
The NYUAD Arabic UD treebank is based on the Penn Arabic Treebank (PATB), parts 1, 2, and 3, through conversion to CATiB dependencytrees.
* Contributors: Nizar Habash, Dima Taji* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Arabic treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Armenian 1 52K ____________ IE, ArmenianARMENIAN TREEBANKS
ArmTDP 52K ____________ Modern Eastern Armenian Universal Dependencies treebank, developed for UD originally by the ArmTDP team led by Marat M. Yavrumyan at the Yerevan State University. * Contributors: Marat M. Yavrumyan* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Assyrian 1 <1K ____ Afro-Asiatic, SemiticASSYRIAN TREEBANKS
AS <1K ____
The Uppsala Assyrian Treebank is a small treebank for Modern Standard Assyrian. The corpus is collected and annotated manually. The data was randomly collected from different textbooks and a short translation of The Merchant of Venice. * Contributors: Mary Yako* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Bambara 1 13K ____ MandeBAMBARA TREEBANKS
CRB 13K ____
The UD Bambara treebank is a section of the Corpus Référence du Bambara annotated natively with Universal Dependencies. * Contributors: Katya Aplonova, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Basque 1 121K __ BasqueBASQUE TREEBANKS
BDT 121K __
The Basque UD treebank is based on a automatic conversion from part of the Basque Dependency Treebank (BDT), created at the University of of the Basque Country by the IXA NLP research group. The treebank consists of 8.993 sentences (121.443 tokens) and covers mainly literary and journalistic texts. * Contributors: Maria Jesus Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola,Larraitz Uria
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Belarusian 1 275K ____________ IE, Slavic BELARUSIAN TREEBANKS HSE 275K ____________ The Belarusian UD treebank is based on a sample of the news texts included in the Belarusian-Russian parallel subcorpus of the Russian National Corpus, online search available at: http://ruscorpora.ru/search-para-be.html. * Contributors: Olga Lyashevskaya, Angelika Peljak-Łapińska, DariaPetrova
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Bhojpuri 2 6K ____ IE, IndicBHOJPURI TREEBANKS
BHTB 6K ____
The (https://en.wikipedia.org/wiki/Bhojpuri_language) UD Treebank (BHTB) is a part of the (http://universaldependencies.org/) project. * Contributors: Atul Kr. Ojha, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
BhEn - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Atul Kr. Ojha* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Breton 1 10K ____________ IE, CelticBRETON TREEBANKS
KEB 10K ____________ UD Breton-KEB is a treebank of Breton that has been manually annotated according to the Universal Dependencies guidelines. The tokenisation guidelines and morphological annotation comes from a finite-state morphological analyser of Breton released as part of the (http://www.apertium.org). * Contributors: Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Bulgarian 1 156K ______ IE, SlavicBULGARIAN TREEBANKS
BTB 156K ______
UD_Bulgarian-BTB is based on the HPSG-based BulTreeBank, created at the Institute of Information and Communication Technologies, Bulgarian Academy of Sciences. The original consists of 215,000 tokens (over 15,000 sentences). All the texts were processed automatically at tokenization, morphological and chunk level. Then, the full syntactic analysis were perfomed manually by trained annotators. * Contributors: Kiril Simov, Petya Osenova, Martin Popel* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Buryat 1 10K ______ MongolicBURYAT TREEBANKS
BDT 10K ______
The UD Buryat treebank was annotated manually natively in UD and contains grammar book sentences, along with news and some fiction. * Contributors: Elena Badmaeva, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Cantonese 1 13K __ Sino-TibetanCANTONESE TREEBANKS
HK 13K __
A Cantonese treebank (in Traditional Chinese characters) of film subtitles and of legislative proceedings of Hong Kong, parallel with the Chinese-HK treebank. * Contributors: Kim Gerdes, John Lee, Herman Leung, Tak-sum Wong* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Catalan 1 531K __ IE, RomanceCATALAN TREEBANKS
AnCora 531K __
Catalan data from the AnCora corpus. * Contributors: Héctor Martínez Alonso, Elena Pascual, DanielZeman
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Chinese 5 285K ________ Sino-TibetanCHINESE TREEBANKS
GSDSimp 123K __
Simplified Chinese Universal Dependencies dataset converted from the GSD (traditional) dataset with manual corrections. * Contributors: Peng Qi, Koichi Yasuoka* Repository master
dev
* README
* Treebank hub page
* Download
GSD 123K __
Traditional Chinese Universal Dependencies Treebank annotated and converted by Google. * Contributors: Mo Shen, Ryan McDonald, Daniel Zeman, Peng Qi* Repository master
dev
* README
* Treebank hub page
* Download
PUD 21K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Josie Li, Cheuk Ying Li, Martin Popel, Daniel Zeman,Herman Leung
* Repository master
dev
* README
* Treebank hub page
* Download
HK 9K __
A Traditional Chinese treebank of film subtitles and of legislative proceedings of Hong Kong, parallel with the Cantonese-HK treebank. * Contributors: Kim Gerdes, John Lee, Herman Leung, Tak-sum Wong* Repository master
dev
* README
* Treebank hub page
* Download
CFL 7K __
The Chinese-CFL UD treebank is manually annotated by Keying Li with minor manual revisions by Herman Leung and John Lee at City University of Hong Kong, based on essays written by learners of Mandarin Chinese as a foreign language. The data is in Simplified Chinese. * Contributors: John Lee, Herman Leung, Keying Li* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Chinese treebanks. LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Chukchi 1 6K __ Chukotko-KamchatkanCHUKCHI TREEBANKS
HSE 6K __
This data is a manual annotation of the corpus from multimedia annotated corpus of the (http://chuklang.ru/) project, a dialectal corpus of the Amguema variant of Chukchi. * Contributors: Francis Tyers, Karina Mischenkova* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Classical Chinese 1 233K __ Sino-Tibetan CLASSICAL CHINESE TREEBANKSKyoto 233K __ ?
Classical Chinese Universal Dependencies Treebank annotated and converted by Institute for Research in Humanities, Kyoto University. * Contributors: Koichi Yasuoka, Christian Wittern, Tomohiko Morioka, Takumi Ikeda, Naoki Yamazaki, Yoshihiro Nikaido, Shingo Suzuki, Shigeki Moro, Yuan Li, Hiroyuki Shirasu, Kazunori Fujita* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Coptic 1 48K ______ Afro-Asiatic, EgyptianCOPTIC TREEBANKS
Scriptorium 48K ______ UD Coptic contains manually annotated Sahidic Coptic texts, including Biblical texts, sermons, letters, and hagiography. * Contributors: Mitchell Abrams, Elizabeth Davidson, Amir Zeldes* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Croatian 1 199K ______ IE, SlavicCROATIAN TREEBANKS
SET 199K ______
The Croatian UD treebank is based on the extension of the SETimes-HR corpus, the (http://hdl.handle.net/11356/1183) corpus. * Contributors: Željko Agić, Nikola Ljubešić, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Czech 5 2,227K ______________ IE, SlavicCZECH TREEBANKS
PDT 1,509K ______
The Czech-PDT UD treebank is based on the Prague Dependency Treebank 3.0 (PDT), created at the Charles University in Prague. * Contributors: Daniel Zeman, Jan Hajič* Repository master
dev
* README
* Treebank hub page
* Download
CAC 495K __________
The UD_Czech-CAC treebank is based on the Czech Academic Corpus 2.0 (CAC; Český akademický korpus; ČAK), created at Charles Universityin Prague.
* Contributors: Barbora Hladká, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
FicTree 167K __
FicTree is a treebank of Czech fiction, automatically converted into the UD format. The treebank was built at Charles University in Prague. * Contributors: Tomáš Jelínek, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
CLTT 36K __
The UD_Czech-CLTT treebank is based on the Czech Legal Text Treebank 1.0, created at Charles University in Prague. * Contributors: Barbora Hladká, Daniel Zeman, Martin Popel* Repository master
dev
* README
* Treebank hub page
* Download
PUD 18K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Václava Kettnerová, Jan Hajič jr., Silvie Cinková, Zdeňka Urešová, Milan Straka, Jan Hajič, Jaroslava Hlaváčová, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statisticsof Czech treebanks.
LANGUAGE DOCUMENTATION See the language documentation page . Danish 2 100K ________ IE, GermanicDANISH TREEBANKS
DDT 100K ________
The Danish UD treebank is a conversion of the Danish DependencyTreebank.
* Contributors: Anders Johannsen, Héctor Martínez Alonso, BarbaraPlank
* Repository master
dev
* README
* Treebank hub page
* Download
DTB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Natalie Schluter* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Dutch 2 306K ____ IE, GermanicDUTCH TREEBANKS
Alpino 208K __
This corpus consists of samples from various treebanks annotated at the University of Groningen using the Alpino annotation tools andguidelines.
* Contributors: Daniel Zeman, Zdeněk Žabokrtský, Gosse Bouma,Gertjan van Noord
* Repository master
dev
* README
* Treebank hub page
* Download
LassySmall 98K __
This corpus contains sentences from the Wikipedia section of the Lassy Small Treebank. Universal Dependency annotation was generated automatically from the original annotation in Lassy. * Contributors: Gosse Bouma, Gertjan van Noord* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statisticsof Dutch treebanks.
LANGUAGE DOCUMENTATION See the language documentation page . English 9 648K ____________________________ IE, GermanicENGLISH TREEBANKS
GUM 113K ______________ Universal Dependencies syntax annotations from the GUM corpus (https://corpling.uis.georgetown.edu/gum/) * Contributors: Siyao Peng, Amir Zeldes* Repository master
dev
* README
* Treebank hub page
* Download
ParTUT 49K ______
UD_English-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles,among others.
* Contributors: Cristina Bosco, Manuela Sanguinetti* Repository master
dev
* README
* Treebank hub page
* Download
PUD 21K ____
This is the English portion of the Parallel Universal Dependencies (PUD) treebanks created for the CoNLL 2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Jesse Kirchner, Lorenzo Lambertino, Martin Popel, Daniel Zeman, Christopher Manning, Sebastian Schuster, Siva Reddy* Repository master
dev
* README
* Treebank hub page
* Download
LinES 94K ______
UD English_LinES is the English half of the LinES Parallel Treebank with the original dependency annotation first automatically converted into Universal Dependencies and then partially reviewed. Its contents cover literature, an online manual and Europarl data. * Contributors: Lars Ahrenberg* Repository master
dev
* README
* Treebank hub page
* Download
Pronouns 1K __
UD English-Pronouns is dataset created to make pronoun identification more accurate and with a more balanced distribution across genders. The dataset is initially targeting the Independent Genitive pronouns, "hers", (independent) "his", (singular) "theirs", "mine", and(singular) "yours".
* Contributors: Robert Munro* Repository master
dev
* README
* Treebank hub page
* Download
GUMReddit 16K ____
Universal Dependencies syntax annotations from the Reddit portion of the GUM corpus (https://corpling.uis.georgetown.edu/gum/) * Contributors: Siyao Peng, Amir Zeldes* Repository master
dev
* README
* Treebank hub page
* Download
EWT 254K ________
A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13 (https://catalog.ldc.upenn.edu/LDC2012T13). * Contributors: Natalia Silveira, Timothy Dozat, Christopher Manning, Sebastian Schuster, Ethan Chi, John Bauer, Miriam Connor, Marie-Catherine de Marneffe, Nathan Schneider, Sam Bowman, Hanzhi Zhu,Daniel Galbraith
* Repository master
dev
* README
* Treebank hub page
* Download
ESL 97K __
UD English-ESL / Treebank of Learner English (TLE) contains manual POS tag and dependency annotations for 5,124 English as a Second Language (ESL) sentences drawn from the Cambridge Learner Corpus First Certificate in English (FCE) dataset. * Contributors: Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, Boris Katz, Margarita Misirpashayeva* Repository master
dev
* README
* Treebank hub page
* Download
BhEn - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Atul Kr. Ojha* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of English treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Erzya 1 17K __ Uralic, MordvinERZYA TREEBANKS
JR 17K __
UD Erzya is the original annotation (CoNLL-U) for texts in the Erzya language, it originally consists of a sample from a number of fiction authors writing originals in Erzya. * Contributors: Jack Rueter, Francis Tyers, Elena Klementieva, OlgaErina, Ivan Riabov
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Estonian 2 494K ______________ Uralic, FinnicESTONIAN TREEBANKS
EDT 438K ________
UD Estonian is a converted version of the Estonian Dependency Treebank (EDT), originally annotated in the Constraint Grammar (CG) annotation scheme, and consisting of genres of fiction, newspaper texts and scientific texts. The treebank contains 30,972 trees,437,769 tokens.
* Contributors: Kadri Muischnek, Kaili Müürisep, Tiina Puolakainen, Andriela Rääbis, Liisi Torga* Repository master
dev
* README
* Treebank hub page
* Download
EWT 56K ______
UD EWT treebank consists of different genres of new media. The treebank contains 4,493 trees, 56,399 tokens. * Contributors: Kadri Muischnek, Kaili Müürisep, Tiina Puolakainen, Dage Särg* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Estonian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Faroese 2 50K ________ IE, GermanicFAROESE TREEBANKS
FarPaHC 40K ______
UD_Icelandic-FarPaHC is a conversion of the (https://github.com/einarfs/farpahc) to the Universal Dependencies scheme. The conversion was done using (https://github.com/thorunna/UDConverter). * Contributors: Þórunn Arnardóttir, Hinrik Hafsteinsson, Einar Freyr Sigurðsson, Anton Karl Ingason, Eiríkur Rögnvaldsson, Joel C.Wallenberg
* Repository master
dev
* README
* Treebank hub page
* Download
OFT 10K __
This is a treebank of Faroese based on the Faroese Wikipedia. * Contributors: Daniel Zeman, Bjartur Mortensen, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Faroese treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Finnish 4 397K ____________________ Uralic, FinnicFINNISH TREEBANKS
TDT 202K ____________ UD_Finnish-TDT is based on the Turku Dependency Treebank (TDT), a broad-coverage dependency treebank of general Finnish covering numerous genres. The conversion to UD was followed by extensive manual checks and corrections, and the treebank closely adheres to the UDguidelines.
* Contributors: Filip Ginter, Jenna Kanerva, Veronika Laippala, Niko Miekka, Anna Missilä, Stina Ojala, Sampo Pyysalo* Repository master
dev
* README
* Treebank hub page
* Download
PUD 15K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Jenna Kanerva, Filip Ginter, Stina Ojala, AnnaMissilä
* Repository master
dev
* README
* Treebank hub page
* Download
OOD 19K ________
Finnish-OOD is an external out-of-domain test set for Finnish-TDT annotated natively into UD scheme. * Contributors: Jenna Kanerva* Repository master
dev
* README
* Treebank hub page
* Download
FTB 159K __
FinnTreeBank 1 consists of manually annotated grammatical examples from VISK. The UD version of FinnTreeBank 1 was converted from a native annotation model with a script and later manually revised. * Contributors: Jussi Piitulainen, Hanna Nurmi* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Finnish treebanks. LANGUAGE DOCUMENTATION See the language documentation page . French 8 1,157K ________________ IE, RomanceFRENCH TREEBANKS
GSD 400K ________
The **UD_French-GSD** was converted in 2015 from the content head version of the universal dependency treebank v2.0 (https://github.com/ryanmcd/uni-dep-tb). It is updated since 2015 independently from the previous source. * Contributors: Marie-Catherine de Marneffe, Bruno Guillaume, Ryan McDonald, Alane Suhr, Joakim Nivre, Matias Grioni, Carly Dickerson,Guy Perrier
* Repository master
dev
* README
* Treebank hub page
* Download
ParTUT 28K ______
UD_French-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles,among others.
* Contributors: Cristina Bosco, Manuela Sanguinetti* Repository master
dev
* README
* Treebank hub page
* Download
Sequoia 70K ________ **UD_French-Sequoia** is an automatic conversion of the Sequoia Treebank corpus (http://deep-sequoia.inria.fr). * Contributors: Marie Candito, Djamé Seddah, Guy Perrier, BrunoGuillaume
* Repository master
dev
* README
* Treebank hub page
* Download
FQB 24K ____
The corpus **UD_French-FQB** is an automatic conversion of the (http://alpage.inria.fr/Treebanks/FQB/), a corpus entirely made of questions. * Contributors: Djamé Seddah, Marie Candito, Bruno Guillaume* Repository master
dev
* README
* Treebank hub page
* Download
Spoken 35K __
A Universal Dependencies corpus for spoken French. * Contributors: Kim Gerdes, Sylvain Kahane, Mariam Nakhlé, Chunxiao Yan, Aline Etienne, Marine Courtin* Repository master
dev
* README
* Treebank hub page
* Download
PUD 24K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Jana Strnadová, Gauthier Caron, Martin Popel, Daniel Zeman, Marie-Catherine de Marneffe, Bruno Guillaume* Repository master
dev
* README
* Treebank hub page
* Download
FTB 573K __
The Universal Dependency version of the French Treebank (Abeillé et al., 2003), hereafter UD_French-FTB, is a treebank of sentences from the newspaper Le Monde, initially manually annotated with morphological information and phrase-structure and then converted to the Universal Dependencies annotation scheme. * Contributors: Marie Candito, Bruno Guillaume, Teresa Lynn, Héctor Martínez Alonso, Benoît Sagot, Djamé Seddah, Eric Villemonte de laClergerie
* Repository master
dev
* README
* Treebank hub page
* Download
CrapBank - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Djamé Seddah* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of French treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Galician 2 164K ________ IE, RomanceGALICIAN TREEBANKS
TreeGal 25K __
The Galician-TreeGal is a treebank for Galician developed at LyS Group (Universidade da Coruña). * Contributors: Marcos Garcia* Repository master
dev
* README
* Treebank hub page
* Download
CTG 138K ________
The Galician UD treebank is based on the automatic parsing of the Galician Technical Corpus (http://sli.uvigo.gal/CTG) created at the University of Vigo by the the TALG NLP research group. * Contributors: Xavier Gómez Guinovart* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Galician treebanks. LANGUAGE DOCUMENTATION See the language documentation page . German 4 3,753K __________ IE, GermanicGERMAN TREEBANKS
HDT 3,399K ______
UD German-HDT is a conversion of the Hamburg Dependency Treebank, created at the University of Hamburg through manual annotation in conjunction with a standard for morphologically and syntactically annotating sentences as well as a constraint-based parser. * Contributors: Emanuel Borges Völker, Felix Hennig, Arne Köhn,Maximilan Wendt
* Repository master
dev
* README
* Treebank hub page
* Download
GSD 292K ______
The German UD is converted from the content head version of the (https://github.com/ryanmcd/uni-dep-tb). * Contributors: Slav Petrov, Wolfgang Seeker, Ryan McDonald, Joakim Nivre, Daniel Zeman, Adriane Boyd* Repository master
dev
* README
* Treebank hub page
* Download
PUD 21K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Georg Rehm, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Sebastian Bank, Martin Popel,Daniel Zeman
* Repository master
dev
* README
* Treebank hub page
* Download
LIT 40K __
This treebank aims at gathering texts of the German literary history. Currently, it hosts Fragments of the early Romanticism, i.e. aphorism-like texts mainly dealing with philosophical issues concerning art, beauty and related topics. * Contributors: Alessio Salomoni* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of German treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Gothic 1 55K __ IE, GermanicGOTHIC TREEBANKS
PROIEL 55K __
The UD Gothic treebank is based on the Gothic data from the PROIEL treebank, and consists of Wulfila's Bible translation. * Contributors: Dag Haug* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Greek 1 63K ______ IE, GreekGREEK TREEBANKS
GDT 63K ______
The Greek UD treebank (UD_Greek-GDT) is derived from the Greek Dependency Treebank (http://gdt.ilsp.gr), a resource developed and maintained by researchers at the Institute for Language and Speech Processing/Athena R.C. (http://www.ilsp.gr). * Contributors: Prokopis Prokopidis* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Hebrew 1 161K __ Afro-Asiatic, SemiticHEBREW TREEBANKS
HTB 161K __
A Universal Dependencies Corpus for Hebrew. * Contributors: Yoav Goldberg, Reut Tsarfaty, Amir More, Shoval Sadde, Victoria Basmov* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Hindi 2 375K ____ IE, IndicHINDI TREEBANKS
HDTB 351K __
The Hindi UD treebank is based on the Hindi Dependency Treebank (HDTB), created at IIIT Hyderabad, India. * Contributors: Riyaz Ahmad Bhat, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
PUD 23K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Esha Banerjee, Pinkey Nainwani, Martin Popel, DanielZeman
* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statisticsof Hindi treebanks.
LANGUAGE DOCUMENTATION See the language documentation page . Hindi English 1 26K __ Code switching HINDI ENGLISH TREEBANKSHIENCS 26K __
The Hindi-English Code-switching treebank is based on code-switching tweets of Hindi and English multilingual speakers (mostly Indian) on Twitter. The treebank is manually annotated using UD sceheme. The training and evaluations sets were seperately annotated by different annotators using UD v2 and v1 guidelines respectively. The evaluation sets are automatically converted from UD v1 to v2. * Contributors: Riyaz Ahmad Bhat, Irshad Ahmad Bhat* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Hittite 1 1K __ IE, AnatolianHITTITE TREEBANKS
HitTB 1K __
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Erik Andersen, Ben Rozonoyer* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Hungarian 1 42K __ Uralic, UgricHUNGARIAN TREEBANKS
Szeged 42K __
The Hungarian UD treebank is derived from the Szeged Dependency Treebank (Vincze et al. 2010). * Contributors: Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Icelandic 2 1,003K ____________ IE, GermanicICELANDIC TREEBANKS
IcePaHC 985K ________ UD_Icelandic-IcePaHC is a conversion of the (https://linguist.is/icelandic_treebank/Icelandic_Parsed_Historical_Corpus_(IcePaHC)) to the Universal Dependencies scheme. The conversion was done using (https://github.com/thorunna/UDConverter). * Contributors: Þórunn Arnardóttir, Hinrik Hafsteinsson, Einar Freyr Sigurðsson, Hildur Jónsdóttir, Kristín Bjarnadóttir, Anton Karl Ingason, Kristján Rúnarsson, Steinþór Steingrímsson, Joel C. Wallenberg, Eiríkur Rögnvaldsson* Repository master
dev
* README
* Treebank hub page
* Download
PUD 18K ____
Icelandic-PUD is the Icelandic part of the Parallel Universal Dependencies (PUD) treebanks. * Contributors: Hildur Jónsdóttir* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Icelandic treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Indonesian 3 169K ________ Austronesian, Malayo-Sumbawan INDONESIAN TREEBANKSPUD 19K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Ruli Manurung, Muh Shohibussirri, Martin Popel, Daniel Zeman, Ika Alfina, Arawinda Dinakaramani, Muhammad Yudistira Hanifmuti, Jessica Naraiswari Arwidarasti, Yogi Lesmana Sulestio* Repository master
dev
* README
* Treebank hub page
* Download
CSUI 28K ____
UD Indonesian-CSUI is a conversion from an Indonesian constituency treebank in the Penn Treebank format named (https://github.com/ialfina/kethu) that was also a conversion from a constituency treebank built by (https://github.com/famrashel/idn-treebank). We named this treebank **Indonesian-CSUI**, since all the three versions of the treebanks were built at Faculty of Computer Science, UniversitasIndonesia.
* Contributors: Ika Alfina, Jessica Naraiswari Arwidarasti, Muhammad Yudistira Hanifmuti, Arawinda Dinakaramani, Ruli Manurung, Fam Rashel,Andry Luthfi
* Repository master
dev
* README
* Treebank hub page
* Download
GSD 121K ____
The Indonesian UD is converted from the content head version of the (https://github.com/ryanmcd/uni-dep-tb). * Contributors: Ryan McDonald, Joakim Nivre, Daniel Zeman, SeptinaDian Larasati
* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Indonesian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Irish 1 115K __________ IE, CelticIRISH TREEBANKS
IDT 115K __________
A Universal Dependencies 4910-sentence treebank for modern Irish. * Contributors: Teresa Lynn, Jennifer Foster, Sarah McGuinness, Abigail Walsh, Jason Phelan* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Italian 6 811K __________ IE, RomanceITALIAN TREEBANKS
ISDT 298K ______
The Italian corpus annotated according to the UD annotation scheme was obtained by conversion from ISDT (Italian Stanford Dependency Treebank), released for the dependency parsing shared task of Evalita-2014 (Bosco et al. 2014). * Contributors: Cristina Bosco, Alessandro Lenci, Simonetta Montemagni, Maria Simi* Repository master
dev
* README
* Treebank hub page
* Download
VIT 280K ____
The UD_Italian-VIT corpus was obtained by conversion from VIT (Venice Italian Treebank), developed at the Laboratory of Computational Linguistics of the Università Ca' Foscari in Venice (Delmonte et al. 2007; Delmonte 2009; http://rondelmo.it/resource/VIT/Browser-VIT/index.htm). * Contributors: Fabio Tamburini, Maria Simi, Cristina Bosco* Repository master
dev
* README
* Treebank hub page
* Download
ParTUT 55K ______
UD_Italian-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles,among others.
* Contributors: Cristina Bosco, Manuela Sanguinetti* Repository master
dev
* README
* Treebank hub page
* Download
TWITTIRO 29K __
TWITTIRÒ-UD is a collection of ironic Italian tweets annotated in Universal Dependencies. The treebank can be exploited for the training of NLP systems to enhance their performance on social media texts, and in particular, for irony detection purposes. * Contributors: Alessandra T. Cignarella, Cristina Bosco, ManuelaSanguinetti
* Repository master
dev
* README
* Treebank hub page
* Download
PoSTWITA 124K __
PoSTWITA-UD is a collection of Italian tweets annotated in Universal Dependencies that can be exploited for the training of NLP systems to enhance their performance on social media texts. * Contributors: Cristina Bosco, Manuela Sanguinetti* Repository master
dev
* README
* Treebank hub page
* Download
PUD 23K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Antonio Stella, Davide Rovati, Martin Popel, Daniel Zeman, Maria Simi, Manuela Sanguinetti* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Italian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Japanese 5 1,676K ____________ JapaneseJAPANESE TREEBANKS
GSD 192K ____
This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation. The original sentences are from Google UDT 2.0. * Contributors: Mai Omura, Yusuke Miyao, Hiroshi Kanayama, Hiroshi Matsuda, Aya Wakasa, Kayo Yamashita, Masayuki Asahara, Takaaki Tanaka, Yugo Murawaki, Yuji Matsumoto, Shinsuke Mori, Sumire Uematsu, Ryan McDonald, Joakim Nivre, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
PUD 28K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Mai Omura, Yusuke Miyao, Hiroshi Kanayama, Hiroshi Matsuda, Aya Wakasa, Kayo Yamashita, Masayuki Asahara, Takaaki Tanaka, Yugo Murawaki, Yuji Matsumoto, Shinsuke Mori, Sumire Uematsu, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Atsuko Shimada, Anna Trukhina, Martin Popel, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
BCCWJ 1,250K __________ This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation. The original sentences are from `Balanced Corpus of Contemporary Written Japanese'(BCCWJ). * Contributors: Mai Omura, Masayuki Asahara, Yusuke Miyao, Takaaki Tanaka, Hiroshi Kanayama, Yuji Matsumoto, Shinsuke Mori, Sumire Uematsu, Yugo Murawaki* Repository master
dev
* README
* Treebank hub page
* Download
Modern 14K __
This Universal Dependencies (UD) Japanese treebank is based on the definition of UD Japanese convention described in the UD documentation. The original sentences are from `Corpus of HistoricalJapanese' (CHJ).
* Contributors: Mai Omura, Masayuki Asahara, Yuta Takahashi* Repository master
dev
* README
* Treebank hub page
* Download
KTC 189K __
Please add a summary section to the treebank readme file * Contributors: Masayuki Asahara, Hiroshi Kanayama, Yuji Matsumoto, Yusuke Miyao, Shunsuke Mori, Takaaki Tanaka, Sumire Uematsu* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Japanese treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Karelian 1 3K ______ Uralic, FinnicKARELIAN TREEBANKS
KKPP 3K ______
UD Karelian-KKPP is a manually annotated new corpus of Karelian made in Universal dependencies annotation scheme. The data is collected from (http://dictorpus.krc.karelia.ru/en/corpus/text) and consists of mostly modern news texts but also some stories andeducational texts.
* Contributors: Tommi A Pirinen* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Kazakh 1 10K ______ Turkic, NorthwesternKAZAKH TREEBANKS
KTB 10K ______
The UD Kazakh treebank is a combination of text from various sources including Wikipedia, some folk tales, sentences from the UDHR, news and phrasebook sentences. Sentences IDs include partial documentidentifiers.
* Contributors: Aibek Makazhanov, Jonathan North Washington, FrancisTyers
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Khunsari 1 <1K ____ IE, IranianKHUNSARI TREEBANKS
AHA <1K ____
The AHA Khunsari Treebank is a small treebank for contemporary Khunsari. Its corpus is collected and annotated manually. We have prepared this treebank based on interviews with Khunsari speakers. * Contributors: AmirHossein Mojiri Foroushani, Hamid Aghaei, AmirAhmadi
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Komi Permyak 1 <1K __ Uralic, Permic KOMI PERMYAK TREEBANKSUH <1K __
This is a Komi-Permyak literary language treebank consisting of original and translated texts. * Contributors: Larisa Ponomareva, Niko Partanen, Jack Rueter,Francis Tyers
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Komi Zyrian 2 6K ____ Uralic, Permic KOMI ZYRIAN TREEBANKSLattice 5K __
UD Komi-Zyrian Lattice is a treebank of written standard Komi-Zyrian. * Contributors: Niko Partanen, KyungTae Lim, Thierry Poibeau, JackRueter
* Repository master
dev
* README
* Treebank hub page
* Download
IKDP 1K __
This treebank consists of dialectal transcriptions of spoken Komi-Zyrian. The current texts are short recorded segments from different areas where the Iźva dialect of Komi language is spoken. * Contributors: Niko Partanen, Rogier Blokland, Michael Rießler,Jack Rueter
* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Komi Zyrian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Korean 5 446K ______________ KoreanKOREAN TREEBANKS
Kaist 350K ______
The KAIST Korean Universal Dependency Treebank is generated by Chun et al., 2018 from the constituency trees in the (http://semanticweb.kaist.ac.kr/home/index.php/Corpus4). * Contributors: Jinho Choi, Na-Rae Han, Jena Hwang, Jayeol Chun* Repository master
dev
* README
* Treebank hub page
* Download
GSD 80K ____
The Google Korean Universal Dependency Treebank is first converted from the (https://github.com/ryanmcd/uni-dep-tb), and then enhanced byChun et al., 2018.
* Contributors: Ryan McDonald, Joakim Nivre, Daniel Zeman, Jinho Choi, Na-Rae Han, Jena Hwang, Jayeol Chun* Repository master
dev
* README
* Treebank hub page
* Download
PUD 16K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Sookyoung Kwak, Yongseok Cho, Martin Popel, DanielZeman
* Repository master
dev
* README
* Treebank hub page
* Download
Penn - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Jinho Choi, Narae Han, Jena Hwang, Jayeol Chun* Repository master
dev
* README
* Treebank hub page
* Download
Sejong - __
Please add a summary section to the treebank readme file * Contributors: Jaemin Cho* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Korean treebanks. LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Kurmanji 1 10K ____ IE, IranianKURMANJI TREEBANKS
MG 10K ____
The UD Kurmanji corpus is a corpus of Kurmanji Kurdish. It contains fiction and encyclopaedic texts in roughly equal measure. It has been annotated natively in accordance with the UD annotation scheme. * Contributors: Memduh Gökırmak, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Latin 4 922K ________ IE, LatinLATIN TREEBANKS
LLCT 242K ____
This Universal Dependencies version of the **LLCT** (Late Latin Charter Treebank) consists of an automated conversion of the **LLCT2** treebank from the Latin Dependency Treebank (LDT) format into the Universal Dependencies standard. * Contributors: Timo Korkiakangas, Flavio Massimiliano Cecchini,Marco Passarotti
* Repository master
dev
* README
* Treebank hub page
* Download
ITTB 450K __
Latin data from the _Index Thomisticus_ Treebank. Data are taken from the _Index Thomisticus_ corpus by Roberto Busa SJ, which contains the complete work by Thomas Aquinas (1225–1274; Medieval Latin) and by 61 other authors related to Thomas. * Contributors: Marco Passarotti, Daniel Zeman, Berta González Saavedra, Flavio Massimiliano Cecchini* Repository master
dev
* README
* Treebank hub page
* Download
PROIEL 200K ____
The Latin PROIEL treebank is based on the Latin data from the PROIEL treebank, and contains most of the Vulgate New Testament translations plus selections from Caesar's Gallic War, Cicero's Letters to Atticus, Palladius' Opus Agriculturae and the first book of Cicero's Deofficiis.
* Contributors: Dag Haug* Repository master
dev
* README
* Treebank hub page
* Download
Perseus 29K ______
This Universal Dependencies Latin Treebank consists of an automatic conversion of a selection of passages from the Ancient Greek and Latin Dependency Treebank 2.1 * Contributors: Giuseppe G. A. Celano, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statisticsof Latin treebanks.
LANGUAGE DOCUMENTATION See the language documentation page . Latvian 1 220K __________ IE, BalticLATVIAN TREEBANKS
LVTB 220K __________ Latvian UD Treebank is based on Latvian Treebank ((http://sintakse.korpuss.lv)), being created at University of Latvia, Institute of Mathematics and Computer Science, (http://ailab.lv). * Contributors: Lauma Pretkalniņa, Laura Rituma, Baiba Saulīte, Gunta Nešpore-Bērzkalne, Normunds Grūzītis* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Laz 1 2K __ KartvelianLAZ TREEBANKS
BOUN 2K __
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ...* Contributors:
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Lithuanian 2 75K ________ IE, Baltic LITHUANIAN TREEBANKS ALKSNIS 70K ________ The Lithuanian dependency treebank ALKSNIS v3.0 (Vytautas MagnusUniversity).
* Contributors: Andrius Utka, Erika Rimkutė, Agnė Bielinskienė, Jolanta Kovalevskaitė, Loïc Boizou, Gabrielė Aleksandravičiūtė, Kristina Brokaitė, Daniel Zeman, Natalia Perkova, BernadetaGriciūtė
* Repository master
dev
* README
* Treebank hub page
* Download
HSE 5K ____
Lithuanian treebank annotated manually (dependencies) using the Morphological Annotator by CCL, Vytautas Magnus University (http://tekstynas.vdu.lt/) and manual disambiguation. A pilot version which includes news and an essay by Tomas Venclova is available here. * Contributors: Olga Lyashevskaya, Dmitry Sichinava* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Lithuanian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Livvi 1 1K ______ Uralic, FinnicLIVVI TREEBANKS
KKPP 1K ______
UD Livvi-KKPP is a manually annotated new corpus of Livvi-Karelian made directly in the Universal dependencies annotation scheme. The data is collected from (http://dictorpus.krc.karelia.ru/en/corpus/text) and consists of mostly modern news texts but also some stories and educationaltexts.
* Contributors: Tommi A Pirinen* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Magahi 1 7K ____ IE, IndicMAGAHI TREEBANKS
MGTB 7K ____
The (https://en.wikipedia.org/wiki/Magahi_language) UD Treebank (MGTB) is a part of the (http://universaldependencies.org/) project. * Contributors: Mohit Raj, Deepak Alok, Ritesh Kumar, Atul Kr. Ojha,Daniel Zeman
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Maltese 1 44K __________ Afro-Asiatic, SemiticMALTESE TREEBANKS
MUDT 44K __________
MUDT (Maltese Universal Dependencies Treebank) is a manually annotated treebank of Maltese, a Semitic language of Malta descended from North African Arabic with a significant amount of Italo-Romance influence. MUDT was designed as a balanced corpus with four major genres (see Splitting below) represented roughly equally. * Contributors: Slavomír Čéplö, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Manx 1 6K ________________ IE, CelticMANX TREEBANKS
Cadhan 6K ________________ This is the Cadhan Aonair UD treebank for Manx Gaelic, created byKevin Scannell.
* Contributors: Kevin Scannell* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Marathi 1 3K ____ IE, IndicMARATHI TREEBANKS
UFAL 3K ____
UD Marathi is a manually annotated treebank consisting primarily of stories from Wikisource, and parts of an article on Wikipedia. * Contributors: Vinit Ravishankar* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Mbya Guarani 2 13K ____ Tupian, Tupi-Guarani MBYA GUARANI TREEBANKSThomas 1K __
UD Mbya_Guarani-Thomas is a corpus of Mbyá Guaraní (Tupian) texts collected by Guillaume Thomas. The current version of the corpus consists of three speeches by Paulina Kerechu Núñez Romero, a Mbyá Guaraní speaker from Ytu, Caazapá Department, Paraguay. * Contributors: Guillaume Thomas* Repository master
dev
* README
* Treebank hub page
* Download
Dooley 11K __
UD Mbya_Guarani-Dooley is a corpus of narratives written in Mbyá Guaraní (Tupian) in Brazil, and collected by Robert Dooley. Due to copyright restrictions, the corpus that is distributed as part of UD only contains the annotation (tags, features, relations) while the FORM and LEMMA columns are empty. * Contributors: Guillaume Thomas* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Mbya Guarani treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Middle Irish 2 <1K ______ IE, Celtic MIDDLE IRISH TREEBANKSCritMITB <1K __
Annotation of the classic Scela Mucce Meic Dathó ("The tale of MacDathó's pig").
* Contributors: Ben Rozonoyer, Erik Andersen* Repository master
dev
* README
* Treebank hub page
* Download
DipMITB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Adrian Ó Dubhghaill* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Moksha 1 1K ____ Uralic, MordvinMOKSHA TREEBANKS
JR 1K ____
Erme Universal Dependencies annotated texts Moksha are the origin of UD_Moksha-JR with annotation (CoNLL-U) for texts in the Moksha language, it originally consists of a sample from a number of fiction authors writing originals in Moksha. * Contributors: Jack Rueter, Maria Levina, Nadezhda Kabaeva* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Munduruku 1 <1K ____ Tupian, MundurukuMUNDURUKU TREEBANKS
TuDeT <1K ____
UD_Munduruku-TuDeT is a collection of annotated sentences in (http://www.endangeredlanguages.com/lang/2981). Togetherwith
(https://github.com/UniversalDependencies/UD_Akuntsu-TuDeT)and
(https://github.com/UniversalDependencies/UD_Tupinamba-TuDeT), UD_Munduruku-TuDeT is part of the (https://tular.org) project. * Contributors: Fabrício Ferraz Gerardi, Eva Huber* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Naija 1 140K __ CreoleNAIJA TREEBANKS
NSC 140K __
A Universal Dependencies corpus for spoken Naija (Nigerian Pidgin). * Contributors: Bernard Caron, Emmett Strickland, Marine Courtin, Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Chika Kennedy Ajede, Emeka Onwuegbuzia, Samson Tella* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Nayini 1 <1K ____ IE, IranianNAYINI TREEBANKS
AHA <1K ____
The AHA Nayini Treebank is a small treebank for contemporary Nayini. Its corpus is collected and annotated manually. We have prepared this treebank based on interviews with Nayini speakers. * Contributors: AmirHossein Mojiri Foroushani, Hamid Aghaei, AmirAhmadi
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . North Sami 1 26K ____ Uralic, Sami NORTH SAMI TREEBANKSGiella 26K ____
This is a North Sámi treebank based on a manually disambiguated and function-labelled gold-standard corpus of North Sámi produced by the Giellatekno team at UiT Norgga árktalaš universitehta. * Contributors: Trond Trosterud, Lene Antonsen, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Norwegian 3 666K ________ IE, GermanicNORWEGIAN TREEBANKS
Bokmaal 310K ______
The Norwegian UD treebank is based on the Bokmål section of the Norwegian Dependency Treebank (NDT), which is a syntactic treebank of Norwegian. NDT has been automatically converted to the UD scheme by Lilja Øvrelid at the University of Oslo. * Contributors: Lilja Øvrelid, Fredrik Jørgensen, Petter Hohle* Repository master
dev
* README
* Treebank hub page
* Download
Nynorsk 301K ______
The Norwegian UD treebank is based on the Nynorsk section of the Norwegian Dependency Treebank (NDT), which is a syntactic treebank of Norwegian. NDT has been automatically converted to the UD scheme by Lilja Øvrelid at the University of Oslo. * Contributors: Lilja Øvrelid, Fredrik Jørgensen, Petter Hohle* Repository master
dev
* README
* Treebank hub page
* Download
NynorskLIA 55K __
This Norwegian treebank is based on the LIA treebank of transcribed spoken Norwegian dialects. The treebank has been automatically converted to the UD scheme by Lilja Øvrelid at the University ofOslo.
* Contributors: Lilja Øvrelid, Andre Kaasen* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Norwegian treebanks. LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Old Church Slavonic 1 57K __ IE, Slavic OLD CHURCH SLAVONIC TREEBANKSPROIEL 57K __
The Old Church Slavonic (OCS) UD treebank is based on the Old Church Slavonic data from the PROIEL treebank and contains the text of the Codex Marianus New Testament translation. * Contributors: Dag Haug* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Old East Slavic 3 180K ____ IE, Slavic OLD EAST SLAVIC TREEBANKSRNC 30K ____
`UD_Old_Russian-RNC` is a sample of the Middle Russian corpus (1300-1700), a part of the Russian National Corpus. The data were originally annotated according to the RNC and extended UD-Russian morphological schemas and UD 2.4 dependency schema. * Contributors: Olga Lyashevskaya, Maria Skachedubova* Repository master
dev
* README
* Treebank hub page
* Download
TOROT 149K ____
UD\_Old\_Russian-TOROT is a conversion of a selection of the Old East Slavonic and Middle Russian data in the Tromsø Old Russian and OCS Treebank (TOROT), which was originally annotated in PROIEL dependencyformat.
* Contributors: Hanne Eckhoff* Repository master
dev
* README
* Treebank hub page
* Download
Ruthenian - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Olga Lyashevskaya, Maria Skachedubova* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Old East Slavic treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Old French 1 170K ______ IE, Romance OLD FRENCH TREEBANKSSRCMF 170K ______
UD_Old_French-SRCMF is a conversion of (part of) the SRCMF corpus (Syntactic Reference Corpus of Medieval French (http://srcmf.org/)). * Contributors: Sophie Prévost, Aurélie Collomb, Kim Gerdes, Isabelle Tellier, Marine Courtin, Alexei Lavrentiev, Céline Guillot-Barbance, Loïc Grobol* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Old Irish 2 20K ________ IE, CelticOLD IRISH TREEBANKS
DipSGG 20K ________
A Universal Dependencies treebank for the Old Irish glosses of St.Gall.
* Contributors: Adrian Doyle* Repository master
dev
* README
* Treebank hub page
* Download
DipWBG - ______
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Adrian Doyle* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Old Turkish 1 <1K __ Turkic, Northeastern OLD TURKISH TREEBANKSTonqq <1K __
`UD_Old_Turkish-Tonqq` is an (https://iso639-3.sil.org/code/otk) treebank built upon Turkic script texts or sentences that are trivially convertible. * Contributors: Mehmet Oguz Derin* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Persian 2 654K ____________________ IE, IranianPERSIAN TREEBANKS
PerDT 501K ____________ The Persian Universal Dependency Treebank (PerUDT) is the result of automatic coversion of Persian Dependency Treebank (PerDT) with extensive manual corrections. Please refer to the follwoing work, if you use this data: * Mohammad Sadegh Rasooli, Pegah Safari, Amirsaeid Moloodi, and Alireza Nourian. "The Persian Dependency Treebank Made Universal". 2020 (to appear). * Contributors: Mohammad Sadegh Rasooli, Pegah Safari, Amirsaeid Moloodi, Alireza Nourian* Repository master
dev
* README
* Treebank hub page
* Download
Seraji 152K ______________ The Persian Universal Dependency Treebank (Persian UD) is based on Uppsala Persian Dependency Treebank (UPDT). The conversion of the UPDT to the Universal Dependencies was performed semi-automatically with extensive manual checks and corrections. * Contributors: Mojgan Seraji, Filip Ginter, Joakim Nivre* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Persian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Polish 3 499K __________ IE, SlavicPOLISH TREEBANKS
PDB 350K ______
The Polish PDB-UD treebank is based on the Polish Dependency Bank 2.0 (PDB 2.0), created at the Institute of Computer Science, Polish Academy of Sciences in Warsaw. The PDB-UD treebank is an extended and corrected version of the Polish SZ-UD treebank (the release 1.2 to2.3).
* Contributors: Alina Wróblewska, Daniel Zeman, Jan Mašek, RudolfRosa
* Repository master
dev
* README
* Treebank hub page
* Download
LFG 130K __________
The LFG Enhanced UD treebank of Polish is based on a corpus of LFG (Lexical Functional Grammar) syntactic structures generated by an LFG grammar of Polish, POLFIE, and manually disambiguated by humanannotators.
* Contributors: Agnieszka Patejuk, Adam Przepiórkowski* Repository master
dev
* README
* Treebank hub page
* Download
PUD 18K ____
This is the Polish portion of the Parallel Universal Dependencies (PUD) treebanks, created at the Institute of Computer Science, Polish Academy of Sciences in Warsaw.Re * Contributors: Alina Wróblewska* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Polish treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Portuguese 3 571K ______ IE, Romance PORTUGUESE TREEBANKSPUD 23K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Gustavo Mendonça, Larissa Rinaldi, Martin Popel, Daniel Zeman, Valeria de Paiva, Alexandre Rademaker* Repository master
dev
* README
* Treebank hub page
* Download
Bosque 227K __
This Universal Dependencies (UD) Portuguese treebank is based on the Constraint Grammar converted version of the Bosque, which is part of the Floresta Sintá(c)tica treebank. It contains both European (CETEMPúblico) and Brazilian (CETENFolha) variants. * Contributors: Alexandre Rademaker, Cláudia Freitas, Elvis de Souza, Aline Silveira, Tatiana Cavalcanti, Wograine Evelyn, Luisa Rocha, Isabela Soares-Bastos, Eckhard Bick, Fabricio Chalub, Guilherme Paulino-Passos, Livy Real, Valeria de Paiva, Daniel Zeman, Martin Popel, David Mareček, Natalia Silveira, André Martins* Repository master
dev
* README
* Treebank hub page
* Download
GSD 319K ____
The Brazilian Portuguese UD is converted from the (https://github.com/ryanmcd/uni-dep-tb). * Contributors: Alexandre Rademaker, Ryan McDonald, Joakim Nivre, Daniel Zeman, Fabricio Chalub, Carlos Ramisch* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Portuguese treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Romanian 3 936K __________________ IE, RomanceROMANIAN TREEBANKS
RRT 218K ______________ The Romanian UD treebank (called RoRefTrees) (Barbu Mititelu et al., 2016) is the reference treebank in UD format for standard Romanian. * Contributors: Verginica Barbu Mititelu, Elena Irimia, Cenel-Augusto Perez, Radu Ion, Radu Simionescu, Martin Popel* Repository master
dev
* README
* Treebank hub page
* Download
Nonstandard 572K ____ The Romanian Non-standard UD treebank (called UAIC-RoDia) is based on UAIC-RoDia Treebank. UAIC-RoDia = ISLRN 156-635-615-024-0 * Contributors: Cătălina Mărănduc, Cenel-Augusto Perez, Victoria Bobicev, Cătălin Mititelu, Florinel Hociung, Valentin Roșca, Roman Untilov, Petru Rebeja* Repository master
dev
* README
* Treebank hub page
* Download
SiMoNERo 146K __
SiMoNERo is a medical corpus of contemporary Romanian. * Contributors: Maria Mitrofan, Verginica Barbu Mititelu* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Romanian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Russian 4 1,289K ______________ IE, SlavicRUSSIAN TREEBANKS
GSD 98K __
Russian Universal Dependencies Treebank annotated and converted byGoogle.
* Contributors: Ryan McDonald, Vitaly Nikolaev, Olga Lyashevskaya* Repository master
dev
* README
* Treebank hub page
* Download
Taiga 63K ____________ Universal Dependencies treebank is based on data samples extracted from Taiga Corpus and MorphoRuEval-2017 and GramEval-2020 shared taskscollections.
* Contributors: Olga Lyashevskaya, Olga Rudina, Anna Zhuravleva* Repository master
dev
* README
* Treebank hub page
* Download
PUD 19K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Tatiana Lando, Olga Loginova, Martin Popel, Daniel Zeman, Kira Droganova* Repository master
dev
* README
* Treebank hub page
* Download
SynTagRus 1,107K ______ Russian data from the SynTagRus corpus. * Contributors: Kira Droganova, Olga Lyashevskaya, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Russian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Sanskrit 2 28K ____ IE, IndicSANSKRIT TREEBANKS
UFAL 1K __
A small Sanskrit treebank of sentences from Pañcatantra, an ancient Indian collection of interrelated fables by Vishnu Sharma. * Contributors: Puneet Dwivedi, Daniel Zeman, Erica Biagetti* Repository master
dev
* README
* Treebank hub page
* Download
Vedic 27K __
The Treebank of Vedic Sanskrit contains 4,000 sentences with 27,000 words chosen from metrical and prose passages of the Ṛgveda (RV), the Śaunaka recension of the Atharvaveda (ŚS), the Maitrāyaṇīsaṃhitā (MS), and the Aitareya- (AB) and Śatapatha-Brāhmaṇas (ŚB). Lexical and morpho-syntactic information has been generated using a tagging software and manually validated. POS tags have been induced automatically from the morpho-sytactic information of each word. * Contributors: Salvatore Scarlata, Elia Ackermann, Oliver Hellwig, Erica Biagetti, Paul Widmer* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Sanskrit treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Scottish Gaelic 1 60K ________ IE, Celtic SCOTTISH GAELIC TREEBANKSARCOSG 60K ________
A treebank of Scottish Gaelic based on the (https://github.com/Gaelic-Algorithmic-Research-Group/ARCOSG). * Contributors: Colin Batchelor* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Serbian 1 97K __ IE, SlavicSERBIAN TREEBANKS
SET 97K __
The Serbian UD treebank is based on the (http://hdl.handle.net/11356/1200) corpus and additional news documents from the Serbian web. * Contributors: Tanja Samardžić, Nikola Ljubešić* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Sindhi 1 6K ____ IE, IndicSINDHI TREEBANKS
MazharDootio 6K ____ The Sindhi Universal Dependency Treebank was automatically converted from Sindhi Dependency Treebank (SDTB) which is part of an ongoing effort of creating multi-layered treebanks for Sindhi. * Contributors: Mazhar Dootio* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Skolt Sami 1 1K ______ Uralic, Sami SKOLT SAMI TREEBANKSGiellagas 1K ______
The UD Skolt Sami Giellagas treebank is based almost entirely on spoken Skolt Sami corpora. * Contributors: Jack Rueter, Markus Juutinen, Francis Tyers, Tommi A Pirinen, Mika Hämäläinen* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Slovak 1 106K ______ IE, SlavicSLOVAK TREEBANKS
SNK 106K ______
The Slovak UD treebank is based on data originally annotated as part of the Slovak National Corpus, following the annotation style of the Prague Dependency Treebank. * Contributors: Katarína Gajdošová, Mária Šimková, DanielZeman
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Slovenian 2 170K ________ IE, SlavicSLOVENIAN TREEBANKS
SSJ 140K ______
The Slovenian UD Treebank is a rule-based conversion of the ssj500k treebank, the largest collection of manually syntactically annotated data in Slovenian, originally annotated in the JOS annotation scheme. * Contributors: Kaja Dobrovoljc, Tomaž Erjavec, Simon Krek* Repository master
dev
* README
* Treebank hub page
* Download
SST 29K __
The Spoken Slovenian UD Treebank (SST) is the first syntactically annotated corpus of spoken Slovenian, based on a sample of the reference GOS corpus, a collection of transcribed audio recordings of monologic, dialogic and multi-party spontaneous speech in different everyday situations. * Contributors: Kaja Dobrovoljc, Joakim Nivre* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Slovenian treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Soi 1 <1K ____ IE, IranianSOI TREEBANKS
AHA <1K ____
The AHA Soi Treebank is a small treebank for contemporary Soi. Its corpus is collected and annotated manually. We have prepared this treebank based on interviews with Soi speakers. * Contributors: AmirHossein Mojiri Foroushani, Hamid Aghaei, AmirAhmadi
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . South Levantine Arabic 1 <1K ____ Afro-Asiatic, Semitic SOUTH LEVANTINE ARABIC TREEBANKSMADAR <1K ____
The South_Levantine_Arabic-MADAR treebank consists of 100 manually-annotated sentences taken from the (https://camel.abudhabi.nyu.edu/madar/) (Multi-Arabic Dialect Applications and Resources) project. TO-DO: Add 20 annotated sentences from CCC as a train set. * Contributors: Shorouq Zahra* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Spanish 3 1,004K ________ IE, RomanceSPANISH TREEBANKS
AnCora 549K __
Spanish data from the (http://clic.ub.edu/corpus/) corpus. * Contributors: Héctor Martínez Alonso, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
GSD 431K ________
The Spanish UD is converted from the content head version of the (https://github.com/ryanmcd/uni-dep-tb). * Contributors: Miguel Ballesteros, Héctor Martínez Alonso, Ryan McDonald, Elena Pascual, Natalia Silveira, Daniel Zeman, Joakim Nivre* Repository master
dev
* README
* Treebank hub page
* Download
PUD 23K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Hector Fernandez Alcalde, Laura Moreno Romero, Martin Popel, Daniel Zeman, Héctor Martínez Alonso* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Spanish treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Swedish 3 206K __________ IE, GermanicSWEDISH TREEBANKS
Talbanken 96K ____
The Swedish-Talbanken treebank is based on Talbanken, a treebank developed at Lund University in the 1970s. * Contributors: Joakim Nivre, Aaron Smith* Repository master
dev
* README
* Treebank hub page
* Download
LinES 90K ______
UD Swedish_LinES is the Swedish half of the LinES Parallel Treebank with UD annotations. All segments are translations from English and the sources cover literary genres, online manuals and Europarl data. * Contributors: Lars Ahrenberg* Repository master
dev
* README
* Treebank hub page
* Download
PUD 19K ____
Swedish-PUD is the Swedish part of the Parallel Universal Dependencies (PUD) treebanks. * Contributors: Joakim Nivre, Bernadeta Griciūtė* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Swedish treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Swedish Sign Language 1 1K __ Sign Language SWEDISH SIGN LANGUAGE TREEBANKSSSLC 1K __
The Universal Dependencies treebank for Swedish Sign Language (ISO 639-3: swl) is derived from the Swedish Sign Language Corpus (SSLC) from the department of linguistics, Stockholm University. * Contributors: Moa Gärdenfors, Carl Börstell, Robert Östling, Lars Wallin, Mats Wirén* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Swiss German 1 1K __________ IE, Germanic SWISS GERMAN TREEBANKSUZH 1K __________
_UD\_Swiss\_German-UZH_ is a tiny manually annotated treebank of 100 sentences in different Swiss German dialects and a variety of textgenres.
* Contributors: Noëmi Aepli* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Tagalog 2 1K ______ Austronesian, Central PhilippineTAGALOG TREEBANKS
TRG <1K __
UD_Tagalog-TRG is a UD treebank manually annotated using sentences from a grammar book. * Contributors: Stephanie Samson, Daniel Zeman, Mary Ann C. Tan* Repository master
dev
* README
* Treebank hub page
* Download
Ugnayan 1K ____
Ugnayan is a manually annotated Tagalog treebank currently composed of educational fiction and nonfiction text. The treebank is under development at the University of the Philippines. * Contributors: Angelina Aquino* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Tagalog treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Tamil 2 12K __ Dravidian, SouthernTAMIL TREEBANKS
TTB 9K __
The UD Tamil treebank is based on the Tamil Dependency Treebank created at the Charles University in Prague by Loganathan Ramasamy. * Contributors: Loganathan Ramasamy, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
MWTT 2K __
MWTT - Modern Written Tamil Treebank has sentences taken primarily from a text called "A Grammar of Modern Tamil by Thomas Lehmann (1993). This initial release has 536 sentences of various lengths, and all of these are added as the test set. * Contributors: Sarveswaran K, Parameswari Krishnamurthy, KeerthanaBalasubramani
* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statisticsof Tamil treebanks.
LANGUAGE DOCUMENTATION See the language documentation page . Telugu 1 6K __ Dravidian, South CentralTELUGU TREEBANKS
MTG 6K __
The Telugu UD treebank is created in UD based on manual annotations of sentences from a grammar book. * Contributors: Taraka Rama, Sowmya Vajjala* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Thai 1 22K ____ Tai-KadaiTHAI TREEBANKS
PUD 22K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Rattima Nitisaroj, Yanin Sawanakunanon, MartinPopel, Daniel Zeman
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Tupinamba 1 <1K ____ Tupian, Tupi-GuaraniTUPINAMBA TREEBANKS
TuDeT <1K ____
UD_Tupinamba-TuDeT is a collection of annotated texts in Tupi(nambá). Together with (https://github.com/UniversalDependencies/UD_Akuntsu-TuDeT) and UD_Munduruku-TuDeT, UD_Tupinamba-TuDeT is part of the (https://tular.org). The treebank is ongoing work and is constantly being updated. * Contributors: Fabrício Ferraz Gerardi* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Turkish 4 214K ________ Turkic, SouthwesternTURKISH TREEBANKS
GB 17K __
This is a treebank annotating example sentences from a comprehensive grammar book of Turkish. * Contributors: Çağrı Çöltekin* Repository master
dev
* README
* Treebank hub page
* Download
BOUN 122K ____
The largest Turkish dependency treebank annotated in UD style. Created by the members of (http://http://tabilab.cmpe.boun.edu.tr/) from BoğaziçiUniversity.
* Contributors: Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Gözde Berk, Seyyit Talha Bedir, Abdullatif Köksal, Balkız Öztürk Başaran, Tunga Güngör, Arzucan Özgür* Repository master
dev
* README
* Treebank hub page
* Download
PUD 16K ____
This is a part of the Parallel Universal Dependencies (PUD) treebanks created for the (http://universaldependencies.org/conll17/). * Contributors: Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Slav Petrov, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Savas Cetin, Martin Popel, Daniel Zeman, Francis Tyers, Çağrı Çöltekin, Utku Türk, Furkan Atmaca, Şaziye Betül Özateş, Abdullatif Köksal, Balkız Öztürk Başaran, Tunga Güngör, Arzucan Özgür* Repository master
dev
* README
* Treebank hub page
* Download
IMST 57K ____
The UD Turkish Treebank, also called the IMST-UD Treebank, is a semi-automatic conversion of the IMST Treebank (Sulubacak et al.,2016).
* Contributors: Çağrı Çöltekin, Gülşen Cebiroğlu Eryiğit, Memduh Gökırmak, Hüner Kaşıkara, Umut Sulubacak, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
See here for comparative statistics of Turkish treebanks. LANGUAGE DOCUMENTATION See the language documentation page . Turkish German 1 31K __ Code switching TURKISH GERMAN TREEBANKSSAGT 31K __
UD Turkish-German SAGT is a Turkish-German code-switching treebank that is developed as part of the (https://www.ims.uni-stuttgart.de/en/research/projects/sagt/)project.
* Contributors: Özlem Çetinoğlu, Çağrı Çöltekin* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Ukrainian 1 122K ____________________ IE, SlavicUKRAINIAN TREEBANKS
IU 122K ____________________ Gold standard Universal Dependencies corpus for Ukrainian, developed for UD originally, by (https://mova.institute), NGO. * Contributors: Natalia Kotsyba, Bohdan Moskalevskyi, MykhailoRomanenko
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Upper Sorbian 1 11K ____ IE, Slavic UPPER SORBIAN TREEBANKSUFAL 11K ____
A small treebank of Upper Sorbian based mostly on Wikipedia. * Contributors: Daniel Zeman, Anna Nedoluzhko* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Urdu 1 138K __ IE, IndicURDU TREEBANKS
UDTB 138K __
The Urdu Universal Dependency Treebank was automatically converted from Urdu Dependency Treebank (UDTB) which is part of an ongoing effort of creating multi-layered treebanks for Hindi and Urdu. * Contributors: Riyaz Ahmad Bhat, Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Uyghur 1 40K __ Turkic, SoutheasternUYGHUR TREEBANKS
UDT 40K __
The Uyghur UD treebank is based on the Uyghur Dependency Treebank (UDT), created at the Xinjiang University in Ürümqi, China. * Contributors: Marhaba Eli, Daniel Zeman, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Vietnamese 1 43K __ Austro-Asiatic, Viet-Muong VIETNAMESE TREEBANKSVTB 43K __
The Vietnamese UD treebank is a conversion of the constituent treebank created in the VLSP project (https://vlsp.hpda.vn/). * Contributors: Lương Nguyễn Thị, Linh Hà Mỹ, Phương Lê Hồng, Huyền Nguyễn Thị Minh* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Warlpiri 1 <1K __ Pama-NyunganWARLPIRI TREEBANKS
UFAL <1K __
A small treebank of grammatical examples in Warlpiri, taken from linguistic literature. * Contributors: Daniel Zeman* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Welsh 1 32K __________ IE, CelticWELSH TREEBANKS
CCG 32K __________
UD Welsh-CCG (Corpws Cystrawennol y Gymraeg) is a treebank of Welsh, annotated according to the Universal Dependencies guidelines. * Contributors: Johannes Heinecke, Francis Tyers* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Wolof 1 44K ____ Niger-Congo, Northern AtlanticWOLOF TREEBANKS
WTB 44K ____
UD_Wolof-WTB is a natively manual developed treebank for Wolof. Sentences were collected from encyclopedic, fictional, biographical, religious texts and news. * Contributors: Bamba Dione* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . Yoruba 1 8K ____ Niger-Congo, DefoidYORUBA TREEBANKS
YTB 8K ____
Parts of the Yoruba Bible and of the Yoruba edition of Wikipedia, hand-annotated natively in Universal Dependencies. * Contributors: Adédayọ̀ Olúòkun, Daniel Zeman, Seyi Williams,Ọlájídé Ishola
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION See the language documentation page . UPCOMING UD LANGUAGES Archaic Irish 1 - ____ IE, Celtic ARCHAIC IRISH TREEBANKSOGAM - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Adrian Ó Dubhghaill* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Assamese 1 - ____ IE, IndicASSAMESE TREEBANKS
AsTB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Shikhar Sarma* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Bengali 2 - ________ IE, IndicBENGALI TREEBANKS
BRU - ____
Please add a summary section to the treebank readme file * Contributors: Siratun Jannat, Mizanur Rahoman, Shafi Sourov, Jannatul Ferdaousi, Syeda Shahzadi* Repository master
dev
* README
* Treebank hub page
* Download
DDS - ____
Please add a summary section to the treebank readme file * Contributors: Md. Anwarus Salam Khan, Md. Mahfuzus Salam Khan* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Cusco Quechua 1 - ____ Quechuan CUSCO QUECHUA TREEBANKSSquoia - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Annette Rios, Francis Tyers, Trey Jagiella,Josephine Douglas
* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Dargwa 1 - __ Nakh-Daghestanian, Lak-DargwaDARGWA TREEBANKS
Mehweb - __
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Sasha Kozhukhar, Olga Lyashevskaya* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Georgian 1 - ____ KartvelianGEORGIAN TREEBANKS
GNC - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Ana Kolkhidashvili* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Hiligaynon 1 - __ Austronesian, Central Philippine HILIGAYNON TREEBANKSHTB - __
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Mary Ann C. Tan* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Kannada 1 - __ Dravidian, SouthernKANNADA TREEBANKS
MKG - __
Examples from Modern Kannada Grammar by S.N.Sridhar. * Contributors: Taraka Rama, Sowmya Vajjala* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Khoekhoe 1 - ____ Khoe-KwadiKHOEKHOE TREEBANKS
KDT - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Michael Hahn, Levi Namaseb* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Kiga 1 - ____ Niger-Congo, BantoidKIGA TREEBANKS
EKigaTB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: David Bamutura* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Kyrgyz 1 - ____ Turkic, NorthwesternKYRGYZ TREEBANKS
KTB - ____
... 1-2 sentences (see http://universaldependencies.org/release_checklist.html#the-readme-file for README guidelines) ... * Contributors: Kamen Bonov* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Ladino 1 - ____ IE, RomanceLADINO TREEBANKS
BOUN - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Utku Türk* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Macedonian 1 - ____ IE, Slavic MACEDONIAN TREEBANKSMTB - ____
The Macedonian-MTB treebank is a collection of annotated sentences based on the raw and monolingual corpus called (http://drmj.manu.edu.mk/%D0%B5%D0%BB%D0%B5%D0%BA%D1%82%D1%80%D0%BE%D0%BD%D1%81%D0%BA%D0%B8-%D0%BA%D0%BE%D1%80%D0%BF%D1%83%D1%81-%D0%BD%D0%B0-%D0%BC%D0%B0%D0%BA%D0%B5%D0%B4%D0%BE%D0%BD%D1%81%D0%BA%D0%B8-%D0%BA%D0%BD%D0%B8/), a.k.a 135 Volumes of Macedonian Literature, published by the Macedonian Academy of Sciences and Arts under the CC Attribution-NonCommercial 4.0 International License. The treebank consists mainly of literary and a few non-fiction texts. * Contributors: Vladimir Cvetkoski* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Maghrebi Arabic French 1 - ____ Code switching MAGHREBI ARABIC FRENCH TREEBANKSArabizi - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Djamé Seddah* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Mongolian 1 - ____ MongolicMONGOLIAN TREEBANKS
MTLR - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Siqin Bai* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Ndengeleko 1 - ____ Niger-Congo, Bantoid NDENGELEKO TREEBANKSNTB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Mariel Aquino* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Nkore 1 - ____ Niger-Congo, BantoidNKORE TREEBANKS
ENkoreTB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: David Bamutura* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Occitan 1 - ____ IE, RomanceOCCITAN TREEBANKS
TTB - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Aleksandra Haddad* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Pnar 1 - ____ Austro-Asiatic, KhasianPNAR TREEBANKS
PTB - ____
UD Pnar-PTB is a conversion from the Ring (2017) dataset ((http://dx.doi.org/10.21979/N9/KVFGBZ)) that underpins a grammatical description of the Pnar language (Ring 2015, (http://hdl.handle.net/10356/62519)). The corpus consists of folktales and interviews transcribed, translated, and interlinearized. * Contributors: Hiram Ring* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Pontic 1 - ____ IE, GreekPONTIC TREEBANKS
BOUN - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Utku Türk* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Romansh 2 - ____ IE, RomanceROMANSH TREEBANKS
Rumgr - ____
Please add a summary section to the treebank readme file * Contributors: Sascha Brawer, Martin Cantieni* Repository master
dev
* README
* Treebank hub page
* Download
Sursilv - ____
Please add a summary section to the treebank readme file * Contributors: Sascha Brawer, Martin Cantieni* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Shipibo Konibo 1 - ____ Panoan SHIPIBO KONIBO TREEBANKSPUCP - ____
... 1-2 sentences (see (http://universaldependencies.org/release_checklist.html#the-readme-file) for README guidelines) ... * Contributors: Ronald Ahmed Cárdenas Acosta* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Somali 1 - Afro-Asiatic, CushiticSOMALI TREEBANKS
STB -
Please add a summary section to the treebank readme file * Contributors: Morgan Nilsson* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Sorani 1 - ____ IE, IranianSORANI TREEBANKS
MG - ____
Please add a summary section to the treebank readme file * Contributors: Memduh Gökırmak* Repository master
dev
* README
* Treebank hub page
* Download
LANGUAGE DOCUMENTATION The language hub documentation has not yet been created or ported from the UDv1 documentation. Disclaimer: Our use of flags to symbolise languages is only intended as a visual enhancement of the website and should not be interpreted as a political statement in any way.DOWNLOAD
The data is released through LINDAT/CLARIN. * The next release (v2.8) is scheduled for May 15, 2021 (data freezeon May 1).
* Version 2.7 treebanks are available at http://hdl.handle.net/11234/1-3424. 183 treebanks, 104 languages, released November 15, 2020. * Version 2.6 treebanks are archived at http://hdl.handle.net/11234/1-3226. 163 treebanks, 92 languages, released May 15, 2020. * Version 2.5 treebanks are archived at http://hdl.handle.net/11234/1-3105. 157 treebanks, 90 languages, released November 15, 2019. * Version 2.4 treebanks are archived at http://hdl.handle.net/11234/1-2988. 146 treebanks, 83 languages, released May 15, 2019. * Version 2.3 treebanks are archived at http://hdl.handle.net/11234/1-2895. 129 treebanks, 76 languages, released November 15, 2018. * Version 2.2 treebanks are archived at http://hdl.handle.net/11234/1-2837. 122 treebanks, 71 languages, released July 1, 2018. * Version 2.1 treebanks are archived at http://hdl.handle.net/11234/1-2515. 102 treebanks, 60 languages, released November 15, 2017. * Version 2.0 treebanks are archived at http://hdl.handle.net/11234/1-1983. 70 treebanks, 50 languages, released March 1, 2017. * Test data 2.0 are archived at http://hdl.handle.net/11234/1-2184. 81 treebanks, 49 languages, released May 18, 2017. * Version 1.4 treebanks are archived at http://hdl.handle.net/11234/1-1827. 64 treebanks, 47 languages, released November 15, 2016. * Version 1.3 treebanks are archived at http://hdl.handle.net/11234/1-1699. 54 treebanks, 40 languages, released May 15, 2016. * Version 1.2 treebanks are archived at http://hdl.handle.net/11234/1-1548. 37 treebanks, 33 languages, released November 15, 2015. * Version 1.1 treebanks are archived at http://hdl.handle.net/11234/LRT-1478. 19 treebanks, 18 languages, released May 15, 2015. * Version 1.0 treebanks are archived at http://hdl.handle.net/11234/1-1464. 10 treebanks, 10 languages, released January 15, 2015. * In general, we intend to have regular treebank releases every six months. The v2.0 and v2.2 releases were brought forward because of their usage in the CoNLL 2017 and 2018 Multilingual Parsing SharedTasks .
2014–2020 Universal Dependencies contributors.
Site powered by Annodoc and brat.
Details
Copyright © 2024 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0