Hindi Anaphora Resolution Approach

downloadDownload
  • Words 2128
  • Pages 5
Download PDF

There has been extensive research on anaphora resolution in different foreign languages and respective systems have been made available to online users. Investigation and experiments for anaphora resolution in Hindi had also been claimed by many researches but till now only one system ‘Anaphora Resolver for Hindi ver 0.5’ by Ministry of Electronics and Information Technology, Govt. of India has been made available to users. This section discusses brief overview of the previous research for anaphora resolution in Hindi by the researcher(s) in chronological order of year of work.

Bharati et al. (1993) investigated inter-sentential and intra-sentential anaphora, definite noun phrases as well as surface and deep level ellipsis in Hindi by modifying Paninian parser. The authors had designed a prototype to implement a natural language interface for Hindi databases which accept and analyze user query and create its intermediate representation. The interface designed can be effortlessly modifiable and portable for different language and databases. The domain knowledge has been drawn as mandatory information source apart from processing modules like reference and ellipsis handler and parser. Other module such as mapper and translator are also employed which maps the parse structure from to intermediate representation and to translate it to equivalent SQL query respectively.

Click to get a unique essay

Our writers can write you a new plagiarism-free essay on any topic

Prasad et al. (2000) implemented the principle of centering theory on discourse based approach to identify the antecedent for pronominal specific to third person personal pronouns and zero pronouns in Hindi by investigating, analyzing and salience ranking the important factors like grammatical function, word order, and information status for discourse salience. The list of forward-looking centers in centering theory consists of definite discourse entities suggested by individual instance of utterance in a discourse (Grosz et al., 1995). Each entity in the list has attributes like person, number or gender and their salience were ranked on basis of subject, object, phrase, case, etc. had been compared with other entity. The authors (Prasad et al., 2000) put forward a novel method for determining the relative salience in discourse entities to resolve Hindi anaphora by applying BFP algorithm and S-list algorithm. Both these algorithms were compared on basis of ambiguities generated by them and the authors found that the BFP algorithm was unsuitable for resolving pronoun in Hindi whereas the S-list algorithm found to be very promising for inter- and intra-sentential anaphora.

Sobha et al. (2000) proposed VASISTH, a rule based knowledge poor approach for the resolution of resolving pronominal and non-pronominal anaphora and gaps and ellipsis in Hindi and Malayalam by incorporating morphological markings and limited syntactic parsing information such as POS tagging, clause identification, and person-number-gender (PNG) of the noun phrases and salience measurement. This syntactic knowledge-based approach achieved 82% of accuracy in resolving Hindi anaphora. Authors claimed that VASISTH can be implemented for all Indo-Aryan, Indo-Dravidian and Indic family of languages with minor alteration and reported that system can works with high degree of success in the case of Malayalam but does not resolve the ambiguity. The authors not showed the actual implementation of their system nor they provided the evaluation details.

Dutta et al. (2004) developed a heuristic approach to resolve Hindi pronominal anaphora utilizing semantic information, syntactic information and semantic constraints obtained from Head-Driven Phrase Structure Grammar (HPSG) to achieve high accuracy. The semantic information integrated to remove ambiguities. The authors implemented few heuristic rules and centering theory for resolving intra-sentential and inter-sentential references respectively and shows 63% of accuracy by evaluating approach on 10 different short stories.

Agarwal et al., (2007) resolved Hindi anaphora implementing machine learning approach employing semantic knowledge. The authors compared the syntactic and semantic attributes of different lexical item and made the animacy of entities as matching constraints or baseline grammatical factor. They tested the approach on dataset based on children stories contains 120 pronouns and shows 96% of highest accuracy for anaphora occurred in simple sentences and 80% resolution for complex sentences.

Dutta et al. (2008) proposed rule based approach by modifying Hobb’s theory which works on syntactic information to resolve pronominal, reflexive and possessive pronouns in Hindi. The authors had parsed the sentences manually and tested the algorithms on limited set of sentences. The authors noted the significance of subject and object in Hindi sentences in resolving anaphora and observed that verb phrase or auxiliary verb may or may not suggest the gender information and drawn the conclusion that more accuracy would be yield by including semantic information. The authors also discuss the anaphora in Hindi and the attributes of the Hindi pronoun.

Dutta et al. (2009) highlighted the importance of pronominal divergence occurs at syntactic, semantic and discourse level and the demonstrated the difficulties due to these divergences aroused in translating one natural language to other using three Machine translation systems: AnglaHindi, Matra2 and Google translation system. Authors concluded that anaphoric and non-anaphoric pronoun can be determined by taking pronominal divergence into account and the inflection form of pronoun from root form can be identified by grammatical case based divergence.

Uppalapu et al. (2009) resolved Hindi pronominal anaphora by developing discourse based approach. The authors considered the S-list algorithm in (Prasad et al., 2000) and improved its performance by combining the two different list of entities occurred in present utterance and previous utterances into one. The authors manually annotated dependency relations that had been used to rank the discourse entities. The performance of algorithm has been also improved by integrating syntactic and semantic information. The comparatives studied also conducted by the authors show that the S-list algorithm produces good results in resolving first and second person pronouns and achieved accuracy of 91.58%. The researchers do not reported their results in term of F-score.

Dutta et al. (2011) studied direct and indirect anaphora in Hindi corpus, Emille to develop an annotation scheme for indirect anaphora and conducted experiments to resolve Hindi indirect anaphora through machine learning approach by exploring the semantic structure of sentences. The authors derived and incorporated semantic rules in resolution system to categorized automatically demonstrative pronouns as direct and indirect anaphora which did not have noun phrase as antecedent. This approach had been based on the presence of definite collocation patterns appearing after demonstrative pronouns which do not have existence of nominal occurred previously. The system had been evaluated on small set of data and will perform better if more rules have been tuned and integrated.

Chatterji et al. (2011) investigated antecedent for anaphora in Bengali, Hindi, and Tamil text by implementing data driven and statistical approach with the help of semantic information and PNG constraints. The authors termed the noun phrase or pronoun those have a referent as markables and detected boundary of a sequential markable data and labeled them using Conditional Random Field. The markables having antecedent-anaphora relation were identified and paired using K-fold old decision tree (random tree) algorithm. The authors used Hindi training dataset of 30,000 words and conducted experiment on testing dataset of 15,000 words provided by ICON, 2011 NLP tool contest [2c21] and evaluated an average F-value of 37.48%.

Dakwale et al. (2012) proposed an annotation scheme in Hindi Dependency Treebank (HDT) for anaphora which may have abstract and concrete reference by extending the Treebank with anaphoric relations and introducing the attribute value pair ‘ref’ with the aim of developing a Hindi anaphora resolution system. The authors discuss the challenges faced in annotation of dependency relations, distributed referent span and multiple referents and issues in representing them in the format of HDT. A pronoun in Hindi may have more than one referent or distributed referent span that are not continuous as separated by intermediate texts and have long distances (Dakwale et al., 2012). Moreover the challenge was to identify such pronouns which refer both to concrete and abstract anaphora in Hindi. The authors found easier to identify and annotate the referent of concrete anaphora in compare to abstract anaphora but took no notice of demonstratives, null pronouns, gap, and ellipsis. Apart of anaphoric annotation, the scheme also included annotation of cataphoric pronouns but limited to those having referent in the same sentence.

Lakhmani et al. (2013) presented influence of grammatical cases and resolved the syntactic and semantic challenges in handling pronominal anaphora in Hindi by employing number and gender agreement and animistic knowledge. The authors reported approx 71% of accuracy.

Dakwale et al. (2013) proposed a rule-based hybrid approach to determine the antecedent for entity-pronoun references in Hindi by using grammatical and semantic features as soft and hard constraints. The authors (Dakwale et al., 2013) studied dependency structures for resolving simple anaphoric pronouns and decision tree classifier to resolve ambiguous instances of entity-pronoun references through resolution factors like number agreement, distance feature, named entity categories and semantic knowledge. The experiments had been conducted on training and testing dataset of 325 documents containing 3233 entity pronouns and showed that the rule based approach perform well for all types of anaphors except third person pronoun especially proximal pronouns and achieved an accuracy of 70%.

Singh et al. (2014a) resolved pronominal anaphora resolution for Hindi Language by using two factors: Animistic and Recency based on Gazetteer method. The authors stated that in future they will try to incorporate gender agreement and number agreement. The experiments has been conducted to determine the contribution of applied factors on different styles of written text to resolve pronoun on short Hindi stories, news articles and biography content from Wikipedia.

Lakhmani et al. (2014) proposed knowledge based Gazetteer method with two computational models to resolve Hindi pronominal anaphora incorporating recency factor with centering approach and animistic semantic knowledge with Lappin and Leass approach (Lappin et al., 1994). The method also called list look up method because it creates lists of different nouns and pronouns as gazetteer classes and utilizes animistic and external knowledge to classify them as animistic, non animistic and middle animistic. The authors claimed this method as very fast and the integrated animistic knowledge increase the accuracy to highest level. The Singh et al. (2014b) performed experiment and further compared and analyzed these computational models on basis of constraints, preferences and approaches on different data set.

Devi et al., (2014) proposed a language independent generic anaphora engine based on supervised machine learning approach for resource poor but morphological rich languages like Bengali, Hindi and Tamil. The authors implemented heuristic rule for selecting noun phrase as potential candidate of antecedent and machine learning algorithm to determine the actual antecedent. The training and testing data provided by ICON 2011, NLP tool contest had been preprocessed using morphological analyzer, POS tagger, and Named Entity Recognizer. The inflectional and derivational morphological information contains noun suffix and verb suffix and person, number, and gender (PNG) of words which were analyzed to reduce ambiguity and to extract syntactic, positional, and constraint features of antecedent-anaphora pairs and integrated in machine learning technique. The authors tested the performance of system using evaluation metrics: Precision, Recall and F-measure and showed that positional features improve precision result for all languages and realized the essentiality of analysis of verb for resolving anaphora in Hindi.

Mahato et al. (2015a) experimented 86 sentences and proposed a rule based system for resolving first person, second person, and a third person and reflexive pronouns.

Mahato et al. (2015b) proposed a hybrid machine learning approach to resolve pronominal anaphora and reflexive pronoun incorporating limited syntactic and semantic information. The authors developed the dataset corpus by collecting sentences from different domains which contain 192 pronouns and succeeded to resolve 145 pronouns correctly.

Mujadia et al. (2016) proposed hybrid approaches for identifying pronominal reference type (abstract or concrete) and resolving event anaphora resolution in Hindi for inter sentential or nonstructural sentences using Hindi dependency tree-bank generated by full parser. For resolution purpose, nearest verb have been considered as a potential candidate for event which refers anaphors features such as Pronoun‘s dependency relation, Distance and direction from pronoun to nearest verb, Presence of reporting verb in same sentence, etc.

Singla et al (2017) proposed Rule Based approach for intrasentential anaphors using syntactic and semantic features. The main focus of the proposed approach is Entity Resolution (ER) and pronominal forms. The authors discuss various approaches to resolve Entity-pronoun references in Hindi. They also discuss the Rule Based approach among all the approaches, used to identify the anaphors and their antecedents.

Saqia et al. (2018) highlights the impact of anaphora resolution on opinion target identification and empirically evaluated the impact of anaphora resolution using benchmark datasets. This study describes an impact of anaphora resolution on opinion target identification in text documents. They used nine datasets taken from website for the evaluation of desired work. The proposed work recognized opinion targets from evaluative expression and slightly enhance its result by employing anaphora resolution. The suggested method retrieves domain progressive assessment expressions that can be utilized for identification of target attributes in a cross-domain via a supervised machine learning algorithm.

Till the submission of this thesis, no paper has been published related to Hindi anaphora resolution.

image

We use cookies to give you the best experience possible. By continuing we’ll assume you board with our cookie policy.