germanetpy package

Submodules

germanetpy.compoundInfo module

class germanetpy.compoundInfo.CompoundCategory(*args: Any, **kwargs: Any)[source]

Bases: Enum

This Enum represents the syntactic wordcategory a modifier of a compound can belong to.

Adjektiv = 'Adjektiv'
Nomen = 'Nomen'
Verb = 'Verb'
Adverb = 'Adverb'
Präposition = 'Präposition'
Partikel = 'Partikel'
Pronomen = 'Pronomen'
class germanetpy.compoundInfo.CompoundProperty(*args: Any, **kwargs: Any)[source]

Bases: Enum

This Enum represents the properties a compound constituent can have.

Abkürzung = 'Abkürzung'
Affixoid = 'Affixoid'
Fremdwort = 'Fremdwort'
Konfix = 'Konfix'
Wortgruppe = 'Wortgruppe'
Eigenname = 'Eigenname'
opaquesMorphem = 'opaquesMorphem'
virtuelleBildung = 'virtuelleBildung'
gebundenesMorphem = 'gebundenesMorphem'
freiesMorphem = 'freiesMorphem'
nominalisiertesVerb = 'nominalisiertesVerb'
class germanetpy.compoundInfo.CompoundInfo(modifier1, head, modifier2=None, modifier1property=None, modifier1category=None, mod1LexUnitId1=None, mod1LexUnitId2=None, mod1LexUnitId3=None, modifier2property=None, modifier2category=None, mod2LexUnitId1=None, mod2LexUnitId2=None, mod2LexUnitId3=None, headproperty=None, headLexUnitId=None)[source]

Bases: object

PROPERTY = 'property'
CATEGORY = 'category'
XML_LEX_UNIT_ID = 'lexUnitId'
XML_LEX_UNIT_ID2 = 'lexUnitId2'
XML_LEX_UNIT_ID3 = 'lexUnitId3'
property modifier1
property modifier1_property
property modifier1_category
property mod1_LexUnitId1
property mod1_LexUnitId2
property mod1_LexUnitId3
property modifier2
property modifier2_property
property modifier2_category
property mod2_LexUnitId1
property mod2_LexUnitId2
property mod2_LexUnitId3
property head
property head_property
property head_LexUnitId

germanetpy.filterconfig module

class germanetpy.filterconfig.Filterconfig(search_string: str, ignore_case: bool = False, regex: bool = False, levenshtein_distance: int = 0)[source]

Bases: object

This class is a configuration object, that helps to filter GermaNets lexical units and Synsets to extract the ones with certain interesting properties.

filter_lexunits(germanet) set[source]

Applys the filter to the GermaNet data

Parameters:

germanet (Germanet) – the GermaNet object, loaded from the data

Returns:

a set of lexical units that are left after retrieval is filtered with the given constraints

filter_synsets(germanet) set[source]

Applys the filter to the GermaNet data

Parameters:

germanet (Germanet) – the GermaNet object, loaded from the data

Returns:

a set of synsets that are left after retrieval is filtered with the given constraints

property search_string
property ignore_case
property regex
property levenshtein_distance
property word_classes
property word_categories
property orth_variants

germanetpy.frames module

class germanetpy.frames.Frames(frames2lexunits: dict)[source]

Bases: object

EXPLETIVE = 'NE'
SUBJECT = 'NN'
ACCOBJ = 'AN'
DATOBJ = 'DN'
GENOBJ = 'GN'
PREPOBJ = 'PP'
LOC = 'BL'
DIR = 'BD'
TEMP = 'BT'
MAN = 'BM'
INST = 'BS'
CAUSE = 'BC'
ROLE = 'BR'
COM = 'BO'
reflexives = ['DR', 'AR']
extract_expletives() set[source]

This method extracts all verbs that can take expletives as an argument. Example: “[Es] regnet.”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_accusative_complement() set[source]

This method returns all verbs that can take an accusative complement. Example: “Sie sieht [ihn]”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_dative_complement() set[source]

This method returns all verbs that can take an dative complement. Example: “Sie schenkt [ihm] einen Hund.”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_gentive_complement() set[source]

This method returns all verbs that can take an genetive complement. Example: “Ihre Eltern berauben sie [ihrer Freiheit].”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_prepositional_complement() set[source]

This method returns all verbs that can take an prepositional complement. Example: “Die Kugel klackte [an die Fensterscheibe].”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_reflexives() set[source]

This method returns all verbs that can take an reflexive complement. Example: “Sie wird [sich] rächen.”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_adverbials() set[source]

This method returns all verbs that can take an adverbial complement. Example: “Sie wohnt [in einem Haus].”

Returns:

A set of lexical units that stores all verbs as Lexunits that have the specified frame.

extract_transitives() set[source]

This method returns all transitive verbs. A transitive verb is any verb that can have objects.

Returns:

A set of lexical units that stores all transitive verbs as Lexunits.

extract_intransitives() set[source]

This method returns all intransitive verbs. An intransitive verb is any verb that does not have objects.

Returns:

A set of lexical units that stores all intransitive verbs as Lexunits.

extract_specific_complements(complement: str) set[source]

This method returns all verbs that can take a given complement. This is specified in the frames of a verb.

Param:

complement : a syntactic complement (e.g NN for subject), the complements are specified as class variables of this class

Returns:

A set of lexical units that stores all verbs as Lexunits that can take the specified complement.

property frames2verbs

germanetpy.germanet module

class germanetpy.germanet.Germanet(datadir: str, add_ilirecords: bool = True, add_wiktionary: bool = True)[source]

Bases: object

get_synsets_by_orthform(form: str, ignorecase: bool = False) list[source]

This method returns a list of synsets that match the given input search string

Parameters:
  • form – a word that can be looked up in the GermaNet

  • ignorecase – whether the case of the word should be ignored (default = False)

Returns:

a list of synsets

get_synsets_by_wordcategory(category) list[source]

Returns a list of synsets that belong to the specified word category

Parameters:

category (WordCategory) – The word category of interest

Returns:

A list of Synsets that belong to the specified word category

get_synsets_by_wordclass(wordclass) list[source]

Returns a list of synsets that belong to the specified word class

Parameters:

wordclass (WordClass) – The word category of interest

Returns:

A list of Synsets that belong to the specified word class

get_synset_by_id(id: str)[source]

Returns a Synset by a specified identifier (if that exists, otherwise raises an Error)

Return type:

Synset

Parameters:

id – a Synset identifier

Returns:

The matching Synset object

get_lexunit_by_id(id: str)[source]

Returns a lexical unit by a specified identifier (if that exists, otherwise raises an Error)

Return type:

Lexunit

Parameters:

id – a Lexunit identifier

Returns:

The matching Lexunit object

get_lexunits_by_orthform(form: str, ignorecase: bool = False) list[source]

This method returns a list of lexical units that match the given input search string

Parameters:
  • form – a word that can be looked up in the GermaNet

  • ignorecase – whether the case of the word should be ignored (default = False)

Returns:

a list of lexical units that match the given input query

get_lexunits_by_wordclass(wordclass) list[source]

Returns a list of lexical units that belong to the specified word class

Parameters:

wordclass (WordClass) – The word category of interest

Returns:

A list of lexical units that belong to the specified word class

get_lexunits_by_wordcategory(category) list[source]

Returns a list of lexical units that belong to the specified word category

Parameters:

category (WordCategory) – The word category of interest

Returns:

A list of lexical units that belong to the specified word category

get_synsets_by_frame(frame: str) list[source]

Returns a list of Synsets that match a specified frame

Parameters:

frame – a frame that describes the argument structure of a verb (e.g. ‘NN.AN’ specifies that a verb can take a subject and accusative object as arguments.)

Returns:

a list of Synsets that match the given frame. If the frame is not valid an Assertion Error will be raised

property lexunits
property synsets
property orthform2lexid
property mainOrtform2lexid
property lowercasedform2lexid
property wordcat2lexid
property wordclass2lexid
property compounds
property frames2lexunits
property wiktionary_entries
property ili_records
property frames
property root
property datadir
property add_ilirecords
property add_wiktionary

germanetpy.icbased_similarity module

class germanetpy.icbased_similarity.ICBasedSimilarity(germanet, wordcategory, path: str, separator: str = '\t')[source]

Bases: object

The IC-based measures are computed based on relative frequencies of words in a large corpus. Synset frequencies are computed by adding up the frequencies of all words that belong to a Synset. These measures can not be computed between synsets with different word categories

create_simple_freq_dic(word_category, path: str, separator: str)[source]

Reads in the frequency list files and stores the frequency information for each Synset in a dictionary. The keys are the Synset IDs. This method also adds all available Synset frequencies for the given category.

Parameters:
  • word_category (WordCategory) – The word category

  • path – The path to a frequency list containing words and their frequencies in a corpus

  • separator – The char that separates a word and its frequency in the given frequency list

init_min_max_normalization_values(synset_pair) dict[source]

This methods computes the minimal values (two Synsets are equal) and the maximum values (two Synsets are maximally apart in the graph) for normalization

Parameters:

synset_pair (tuple(Synset, Synset)) – The Tuple of synsets that have the maximum distance in the graph

Returns:

a dictionary containing the (minimum value, maximum value) for each semantic similarity measure.

init_ic_map()[source]

Computes the information content for each synset in GermaNet (of a given word category).

Return type:

dict, Synset

Returns:

A dictionary with a Synset and the corresponding IC, a Synset with the highest IC

get_information_content(synset) float[source]

The information content graduates semantic concepts from general to specific. The more specific a concept, the smaller the probability and thus the higher its informativeness. The information content of a semantic con- cept is estimated by the relative frequency of the concept in a large corpus (cumulated synset frequency)

Parameters:

synset (Synset) – the information content should be computed for

Returns:

the information content for the given synset

resnik(synset1, synset2, normalize: bool = False, normalized_max: float = 1.0) float[source]

Two concepts are more related the more information they share. The shared information of two concepts can be quantified by the information content of two concepts’ lowest common subsumer. When several LCS are available the highest IC is returned.

Parameters:
  • synset1 (Synset) – The source synset

  • synset2 (Synset) – The target synset

  • normalize – The relatedness value can be normalized to a number between the possible minimum of that measure and a given upper bound.

  • normalized_max – The upper bound of the range the measure is normalized to.

Returns:

The information content of the LCS of the two given synsets.

jiang_and_conrath(synset1, synset2, normalize: float = False, normalized_max: float = 1.0) float[source]

The Jiang and Conraths measure includes knowledge about the individual information contents of each synset. The smaller the difference of the information content of the two synsets, the more related they are.

Parameters:
  • synset1 (Synset) – The source synset

  • synset2 (Synset) – The target synset

  • normalize – The relatedness value can be normalized to a number between the possible minimum of that measure and a given upper bound.

  • normalized_max – The upper bound of the range the measure is normalized to.

Returns:

The jiang and conrath relatedness measure

lin(synset1, synset2, normalize: bool = False, normalized_max: float = 1.0) float[source]

The lin measure takes the individual information contents of each synset and the information content of the LCS into account. The LCS with the highest information content is used for the computation.

Parameters:
  • synset1 (Synset) – The source synset

  • synset2 (Synset) – The target synset

  • normalize – The relatedness value can be normalized to a number between the possible minimum of that measure and a given upper bound.

  • normalized_max – The upper bound of the range the measure is normalized to.

Returns:

The Lin relatedness measure

normalize(raw_value: float, normalized_max: float, semrel_measure: SemRelMeasure) float[source]

Normalizes a raw value of semantic relatedness to a value between a lower bound and the given upper bound.

Parameters:
  • raw_value – The raw value

  • normalized_max – The upper bound

  • semrel_measure – The semantic relatedness measure, the value corresponds to.

Returns:

The normalized semantic relatedness value

property germanet
property root_freq
property synset2cumfreq
property jcnmaxdist
property normalization_dic
property synset2ic
property most_informative_synset
property synset2simple_freq

germanetpy.iliLoader module

germanetpy.iliLoader.create_ili_record(attributes, synonyms) IliRecord[source]

Creates the ili record given the XML attributes.

Parameters:
  • attributes (xml attributes) – The XML attributes that contain the required information about the ili record.

  • synonyms (list(String)) – A list of Strings, containing the synonyms of the ili record.

Returns:

The ili record object

germanetpy.iliLoader.load_ili(germanet, tree)[source]

This method creates the ili record objects given a datafile and adds them to the GermaNet object and the corresponding lexical unit.

Parameters:
  • germanet (Germanet) – The GermaNet object

  • tree (Element Tree) – The XML tree containing the data about the ili records

germanetpy.iliRecord module

class germanetpy.iliRecord.IliRecord(lexunit_id: str, ewnRelation: str, pwnWord: str, pwn20Id: str, pwn30Id: str, source: str, pwn20synonyms: list, pwn20paraphrase: str = None)[source]

Bases: object

property lexunit_id
property relation
property english_equivalent
property pwn20id
property pwn30id
property pwn20synonyms
property pwn20paraphrase
property source

germanetpy.lexunit module

class germanetpy.lexunit.LexRel(*args: Any, **kwargs: Any)[source]

Bases: Enum

This enum represents the lexical relation (short: LexRel) that a Lexunit can have in GermaNet. You can find a description of each relation at: https://uni-tuebingen.de/en/142846

has_synonym = 'has_synonym'
has_antonym = 'has_antonym'
has_pertainym = 'has_pertainym'
has_participle = 'has_participle'
has_active_usage = 'has_active_usage'
has_occasion = 'has_occasion'
has_attribute = 'has_attribute'
has_appearance = 'has_appearance'
has_construction_method = 'has_construction_method'
has_container = 'has_container'
is_container_for = 'is_container_for'
has_consistency_of = 'has_consistency_of'
has_component = 'has_component'
has_owner = 'has_owner'
is_owner_of = 'is_owner_of'
has_function = 'has_function'
has_manner_of_functioning = 'has_manner_of_functioning'
has_origin = 'has_origin'
has_production_method = 'has_production_method'
has_content = 'has_content'
has_no_property = 'has_no_property'
has_habitat = 'has_habitat'
has_location = 'has_location'
is_location_of = 'is_location_of'
has_measure = 'has_measure'
is_measure_of = 'is_measure_of'
has_material = 'has_material'
has_member = 'has_member'
is_member_of = 'is_member_of'
has_diet = 'has_diet'
is_diet_of = 'is_diet_of'
has_eponym = 'has_eponym'
has_user = 'has_user'
has_product = 'has_product'
is_product_of = 'is_product_of'
has_prototypical_holder = 'has_prototypical_holder'
is_prototypical_holder_for = 'is_prototypical_holder_for'
has_prototypical_place_of_usage = 'has_prototypical_place_of_usage'
has_relation = 'has_relation'
has_raw_product = 'has_raw_product'
has_other_property = 'has_other_property'
is_storage_for = 'is_storage_for'
has_specialization = 'has_specialization'
has_part = 'has_part'
is_part_of = 'is_part_of'
has_topic = 'has_topic'
is_caused_by = 'is_caused_by'
is_cause_for = 'is_cause_for'
is_comparable_to = 'is_comparable_to'
has_usage = 'has_usage'
has_result_of_usage = 'has_result_of_usage'
has_purpose_of_usage = 'has_purpose_of_usage'
has_goods = 'has_goods'
has_time = 'has_time'
is_access_to = 'is_access_to'
has_ingredient = 'has_ingredient'
is_ingredient_of = 'is_ingredient_of'
class germanetpy.lexunit.OrthFormVariant(*args: Any, **kwargs: Any)[source]

Bases: Enum

This enum represents the four possible orthographical variations

orthForm = 'orthForm'
orthVar = 'orthVar'
oldOrthForm = 'oldOrthForm'
oldOrthVar = 'oldOrthVar'
class germanetpy.lexunit.Lexunit(id: str, synset, sense: int, source: str, named_entity: bool, style_marking: bool, artificial: bool, compound_info=None, orthform: str = None, old_orthform: str = None, orthvar: str = None, old_orthvar: str = None, particle: str = None, base_verb: str = None, comment: str = None)[source]

Bases: object

This class holds the lexical unit object of GermaNet. A lexical unit is a concrete word that is part of a synset.

get_orthform_variant(orthform_variant) str[source]
Parameters:

orthform_variant (OrthFormVariant) – one of the four orthform_variants

Returns:

the string of the requested orthform variant or the main orthform, if the requested orthform doesn’t exist.

get_synonyms()[source]
get_all_orthforms() set[source]
Returns:

A set of all existing orthform variants of the current lexunit.

property id
property synset
property sense
property orthform
property orthvar
property old_orthform
property old_orthvar
property particle
property base_verb
property comment
property frames
property examples
property ili_records
property frames2examples
property wiktionary_paraphrases
property compound_info
property relations
property incoming_relations
property artificial

germanetpy.longest_shortest_path module

germanetpy.longest_shortest_path.get_overall_longest_shortest_distance(germanet, category) -> (<class 'dict'>, <class 'int'>)[source]

Iterate trough the synsets of a given wordcategory. For each synset, extract all possible hypernyms and compute the shortest possible distance to each hypernym. From these distances, also store the longest possible shortest distance.

Parameters:
Returns:

a dictionary with each synset and its longest shortest distance, the overall longest shortest distance

germanetpy.longest_shortest_path.get_greatest_depth(germanet, category) int[source]

Iterate trough the synsets of a given word category. For each synset check the depth and return the greatest depth that has been seen.

Parameters:
Returns:

the greatest depth for a given word category. The depth of a synset is defined by the shortest path length between the synset and the root node

germanetpy.longest_shortest_path.get_longest_possible_shortest_distance(germanet, wordcategory)[source]

set a maxdistcounter = 0 for each synset: get the corresponding longest shortest distance. if this plus the overall longest shortest distance is smaller than maxdistance:

continue with the next synset

if it is larger:

go trough each synset and get the corresponding longest shortest distance. if this plus the longest shortest distance of the synset of interest is smaller than maxdistance:

continue

else:

compute the actual path distance and update the maxdistance if it is larger

Return type:

(int, int, tuple(Synset, Synset)

Parameters:
  • wordcategory (WordCategory) – the wordcategory for which this maxlen should be computed

  • germanet (Germanet) – the germanet graph

Returns:

the longest possible shortest distance between two synsets of a specified wordcategory, the maximum depth

of any synset (lenght to the root) and a Tuple with two synsets that have the longest shortest distance

germanetpy.longest_shortest_path.print_longest_shortest_distances(germanet, word_category)[source]

Computes and prints the longest shortest distances for the given word category.

germanetpy.longest_shortest_path.print_maximum_depths(germanet, word_category)[source]

Computes and prints the maximum depth for the given word_category.

germanetpy.path_based_relatedness_measures module

class germanetpy.path_based_relatedness_measures.PathBasedRelatedness(germanet, category, max_len: int = None, max_depth: int = None, synset_pair=None)[source]

Bases: object

These measures use the GermaNet Graph to compute the shortest Paths between two concepts. These concepts have to have the same word category. The path lengths are normalized in different ways (depending on the measure). The path lengths are computed taking only the hypernymy / hyponymy relations into account

simple_path(synset1, synset2, normalize: bool = False, normalized_max: float = 1.0) float[source]

This measure computes the pathlength and normalizes it by the longest possible shortest path between any two nodes of the corresponding word category.

Parameters:
  • synset1 (Synset) – The source synset

  • synset2 (Synset) – The target synset the source synset is compared to

  • normalize – The relatedness value can be normalized to a number between the possible minimum of that measure and a given upper bound.

  • normalized_max – The upper bound of the range the measure is normalized to.

Returns:

: The normalized path length between two synsets

init_min_max_normalization_values(synset_pair)[source]

This methods computes the minimal values (two synsets are equal) and the maximum values (two synsets are maximally appart in the graph) for normalization

Parameters:

synset_pair – (Synset, Synset) The Tuple of synsets that have the maximum distance in the graph

Returns:

a dictionary [SemRelMeasure : (int, int)] containing the (minimum value, maximum value) for each semantic similarity measure.

wu_and_palmer(synset1, synset2, normalize: bool = False, normalized_max: float = 1.0) float[source]

This methods computes the semantic relatedness by taking the path length into account, normalizing by taking the depth of the LCS. If there are several possible LCS, the one with the largest depth is taken into account.

Parameters:
  • synset1 (Synset) – The source synset

  • synset2 (Synset) – The target synset the source synset is compared to

  • normalize – The relatedness value can be normalized to a number between the possible minimum of that measure and a given upper bound.

  • normalized_max – The upper bound of the range the measure is normalized to.

Returns:

The wu and palmer relatedness measure

leacock_chodorow(synset1, synset2, normalize: bool = False, normalized_max: float = 1.0) float[source]

This method implements the leackock and chodorow relatedness measure. For the path distance and depth, node count is used.

Parameters:
  • synset1 (Synset) – The source synset

  • synset2 (Synset) – The target synset the source synset is compared to

  • normalize – The relatedness value can be normalized to a number between the possible minimum of that measure and a given upper bound.

  • normalized_max – The upper bound of the range the measure is normalized to.

Return::

The leackock and chodorow relatedness measure

normalize(raw_value: float, normalized_max: float, semrel_measure: SemRelMeasure) float[source]

Normalizes a raw value of semantic relatedness to a value between a lower bound and the given upper bound.

Parameters:
  • raw_value – The raw value

  • normalized_max – The upper bound

  • semrel_measure – The semantic relatedness measure, the value corresponds to.

Returns:

The normalized semantic relatedness value

property germanet
property max_len
property max_depth
property category
property normalization_dic

germanetpy.relationLoader module

germanetpy.relationLoader.get_relation_attributes(attributes) -> (<class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>)[source]
Parameters:

attributes (XML attribute) – The XML attributes the information can be extracted from

Returns:

The information as Strings or None if the information is not present. The name of the relation,the id of the start node, the id of the end node, the type of direction and if the relation is inverse

germanetpy.relationLoader.load_relations(germanet, tree)[source]

Loads the information about the related synsets ans lexunits from the data and adds the edges between the objects.

Parameters:
  • germanet (Germanet) – The Germanet object that is populated with Synsets and Lexunits

  • tree (Element Tree) – The XML tree of the relation data.

germanetpy.semrel_measures module

class germanetpy.semrel_measures.SemRelMeasure(*args: Any, **kwargs: Any)[source]

Bases: Enum

This Enum represents the semantic relatedness measures

SimplePath = 'SimplePath'
LeacockAndChodorow = 'LeacockAndChodorow'
WuAndPalmer = 'WuAndPalmer'
Resnik = 'Resnik'
Lin = 'Lin'
JiangAndConrath = 'JiangAndConrath'

germanetpy.synset module

class germanetpy.synset.ConRel(*args: Any, **kwargs: Any)[source]

Bases: Enum

This Enum class contains the conceptual relations (short: ConRel) that synsets can have to other synsets. For a description of each relation look at https://uni-tuebingen.de/en/142846

has_hypernym = 1
has_hyponym = 2
has_component_meronym = 3
has_component_holonym = 4
has_member_meronym = 5
has_member_holonym = 6
has_substance_meronym = 7
has_substance_holonym = 8
has_portion_meronym = 9
has_portion_holonym = 10
entails = 11
is_entailed_by = 12
causes = 14
static transitive(conrel) bool[source]

Returns true if the conceptual relation is transitive, false otherwise

Parameters:

conrel (ConRel) – a conceptual relation

Returns:

true if the conceptual relation is transitive, false otherwise

class germanetpy.synset.WordCategory(*args: Any, **kwargs: Any)[source]

Bases: Enum

This Enum class contains the three part-of-speech tags (WortCategory), a Synset can have in GermaNet. adj = adjective, nomen = noun, verben = verb

adj = 1
nomen = 2
verben = 3
static get_possible_word_classes(word_category) set[source]

Each wor category can only occur with a specific set of word classes.

Parameters:

word_category (WordCategory) – The word category

Returns:

The set of word classes that occur with the given word category

class germanetpy.synset.WordClass(*args: Any, **kwargs: Any)[source]

Bases: Enum

This Enum class contains the semantic wordclasses / semantic fields a Synset can have in GermaNet. For a detailed description see: http://www.sfs.uni-tuebingen.de/GermaNet/germanet_structure.shtml#Tops

Allgemein = 1
Bewegung = 2
Gefuehl = 3
Geist = 4
Gesellschaft = 5
Koerper = 6
Menge = 7
natPhaenomen = 8
Ort = 9
Pertonym = 10
Perzeption = 11
privativ = 12
Relation = 13
Substanz = 14
Verhalten = 15
Zeit = 16
Artefakt = 17
Attribut = 18
Besitz = 19
Form = 20
Geschehen = 21
Gruppe = 22
Kognition = 23
Kommunikation = 24
Mensch = 25
Motiv = 26
Nahrung = 27
natGegenstand = 28
Pflanze = 29
Tier = 30
Tops = 31
Koerperfunktion = 32
Konkurrenz = 33
Kontakt = 34
Lokation = 35
Schoepfung = 36
Veraenderung = 37
Verbrauch = 38
static get_possible_word_categories(word_class)[source]

Each word class can occur with one or several word categories.

Return type:

set(WordCategory)

Parameters:

word_class (WordClass) – the word class to get the possible word categories for

Returns:

the set of word categories the given word class can occur with

class germanetpy.synset.Synset(id: str, word_category: WordCategory, word_class: WordClass)[source]

Bases: object

This class holds a Synset object. A synset in GermaNet contains several lexical units and holds specific relations to other synsets, for example a synset can have hypernyms or hyponyms.

add_lexunit(unit)[source]

Adds a lexical unit that part of this synset to the list of lexical units

Parameters:

unit (Lexunit) – The lexUnit object to be added

is_root() bool[source]
Returns:

True if this Synset is the root of the Graph (= has no hypernyms), otherwise false

is_leaf() bool[source]
Returns:

True if this Synset is a leaf of the Graph (= has no hyponyms), otherwise false

num_lexunits() int[source]
Returns:

The number of lexical units, contained in that synset

hypernym_paths() list[source]

This method iterates recursively through the hypernyms of this synset to get all paths that connect this synset with the root node. a path is complete if it ends with the root node. all possible paths are returned. each path is a list of nodes.

Returns:

A list of lists, each lists contains a node sequence connecting this synset with the root node

all_hypernyms() set[source]

This method extracts all hypernyms for this synset (the transitive closure for this synset)

Returns:

a set, containing all possible hypernym nodes. it is empty if the current synset is the root node

hyponym_paths() list[source]

This method iterates recursively through the hyponyms of this synset to get all paths that connect this synset with a leaf node. A path is complete if it ends with a leaf node. All possible paths are returned. Each path is a list of nodes.

Returns:

A list of lists, each lists contains a node sequence connecting this synset with a leaf node

all_hyponyms() set[source]

This method returns all possible hyponyms of this synset.

Returns:

[set(Synset)] A set of synset nodes, each constitutes a hyponym of the current synset.

shortest_path_to_root() list[source]

This method returns the shortest path to the root node.

Returns:

[list(Synset)] shortest path to the root node.

common_hypernyms(other) set[source]

Given another synset, this method computes shared hypernyms

Parameters:

other (Synset) – another synset object

Returns:

a set of synset nodes, that denotes the shared hypernyms between this synset and the given one.

min_depth() int[source]
Returns:

The length of the shortest hypernym path from this synset to the root.

shortest_path_distance(other) int[source]

Returns the distance of the shortest path linking the two synsets (if one exists). If a node is compared with itself 0 is returned. The distance is denoted by the number of edges that exist in the shortest path.

Parameters:

other (Synset) – The Synset to which the shortest path will be found.

Returns:

The number of edges in the shortest path connecting the two nodes, or None if no path exists.

shortest_path(other) list[source]

Returns the shortest possible sequence of synset nodes that are traversed from this synset to a given other synset. If there are several shortest sequences, all of then are returned.

Parameters:

other (Synset) – A synset the path should be computed to

Returns:

A list of lists, each list containing the sequence of nodes traversed from this synset to the given other synset.

shortest_path_to_hypernym(hypernym) list[source]

The shortest path between this synset and the given hypernym. Asserts that the given other synset is a real hypernym of the current synset.

Parameters:

hypernym (Synset) – a synset, denoting the hypernym the shortest path should be computed to

Returns:

a list of lists, each list storing the shortest sequence of synset nodes traversed from self to the given hypernym

lowest_common_subsumer(other) set[source]

Extract the lowes common subsumer(s) / lowest common ancestor(s) of the current synset and a given one.

Parameters:

other (Synset) – Another synset object the LCS should be computed to.

Returns:

a set, containing one or several synset objects, being the LCS between the current synset and the given one.

get_distances_hypernym_dic() dict[source]

For each hypernym, store the shortest distance between the current synset and its hypernym.

Returns:

A dictionary containing all hypernyms of this synset as keys and the corresponding distances as values.

property id
property word_category
property word_class
property paraphrase
property lexunits
property relations
property incoming_relations
property direct_hypernyms
property direct_hyponyms

germanetpy.synsetLoader module

germanetpy.synsetLoader.get_attribute_element(attributes, element: str, enum)[source]

Constructs an Emum object of a given attribute :rtype: FastEnum :type enum: FastEnum :type attributes: XML attributes :param attributes: XML attributes of a certain XML node :param elment: A String :param enum: The Enum object that should be initialized :return: The corresponding Enum object or None

germanetpy.synsetLoader.get_attribute_element_without_enum(attributes, element: str)[source]

Returns attribute value if attribute exists :type attributes: XML attributes :param attributes: XML attributes of a certain XML node :param elment: A String :return: The corresponding object or None

germanetpy.synsetLoader.create_compound_info(child) CompoundInfo[source]

Creates a compound info object. This has a modifier (String) and a head (String). Each modifier and the head can have a property (CompoundProperty) and a category (CompoundCategory). :param child: the XML element :return: A CompoundInfo object

germanetpy.synsetLoader.load_lexunits(germanet, tree)[source]

Takes the XML tree and walks trough it to create the Lexunit objects. :type tree: Element Tree :type germanet: Germanet :param germanet: the germanet object :param tree: XML tree

germanetpy.synsetLoader.create_lexunit(germanet, attributes, lex_root, synset) Lexunit[source]

Given the XML data, creates a Lexunit object. :type attributes: XML attributes :type germanet: Germanet :param germanet: The germanet object. :param attributes: The XML attributes. :param lex_root: The XML root :param synset: the corresponding synset object :return: a lexical unit object

germanetpy.synsetLoader.add_orth_forms(germanet, lexunit: Lexunit, child_value: str, tag: str)[source]

Checks which orthform the tag contains, and adds it to the lexunit object. Adds the lexunit id to the corresponding dictionary.

Parameters:
  • germanet (Germanet) – The germanet object containing the Orthform variant dictionaries.

  • lexunit – the Lexunit object the Orthform variant needs to be added to

  • child_value – the value of the XML element that contains this Orthform variant

  • tag – the value of the XML tag specifying the type of Orthform variant

germanetpy.utils module

germanetpy.utils.convert_to_boolean(attribute: str) bool[source]

Converts the given String into a boolean.

Parameters:

attribute – The attribute that needs to be converted into a boolean

Returns:

True, False or an Error message if the attribute doesn’t have the right value

germanetpy.utils.parse_xml(datadir: str, f: str) lxml.etree[source]

Parses an XML file and returns the XML tree

Parameters:
  • datadir – The directory where the file is located

  • f – the filename

Returns:

The parsed XML tree

germanetpy.wiktionaryLoader module

germanetpy.wiktionaryLoader.create_wiktionary(attributes) WiktionaryParaphrase[source]

Creates a wiktionary object given the XML attributes that contain the required information

Parameters:

attributes – XML attributes that contain information about the wiktionary paraphrase

Returns:

a wiktionary object

germanetpy.wiktionaryLoader.load_wiktionary(germanet, tree)[source]

Given a XML tree this method initialized the wiktionary objects and adds them to the germanet object and the corresponding lexunits

Parameters:
  • germanet (Germanet) – The germane object

  • tree (etree) – The XML tree of the wiktionary file

germanetpy.wiktionaryparaphrase module

class germanetpy.wiktionaryparaphrase.WiktionaryParaphrase(lexunit_id: str, wiktionary_id: str, wiktionary_sense_id: int, wiktionary_sense: str, edited: bool)[source]

Bases: object

property lexunit_id
property wiktionary_id
property wiktionary_sense_id
property wiktionary_sense
property edited

Module contents