Sentiments Analysis through computer and electronics science

Jump to: navigation, search

Pradeep Tomar and Gurjit Kaur

Sentiment analysis is a text mining sub-category, this means that the analysis is done for finding how relevant provided text is. To find a meaning of the text is the sole purpose of mining the text. Now mining a text for sentiment orientation needs an approach to land on conclusion, sentiment analysis can mainly be done using two approaches. Namely Supervised techniques and unsupervised techniques, another approach is a semi-supervised one as used by Sindhwani et al. [10] and many other in the review below

Work Related toLexicon Based Approaches

Supervised approach as the name says, works under the guidelines and rules described for the text. The approach here is to use labeled data for analyzing the data and generating a result for the data. A labeled data can be a list of sentiment words separated according to their sentiment orientation or in simple words they are labeled according to their relevance towards sentiment, a word can possess multiple relevance it could be negative, positive, weak negative, weak positive, strong negative, strong positive etc. and a single word can be either of them and if none of them is relevant the word is termed objective.

According toGodbole et al. [11] a supervised approach can be used to find out sentiment from new, blog, article etc. and here this worktalks about a large sentiment analyzer supported with a large lexicon, lexicon here act as a labeled data, each and every word is compared with the lexicon and result is generated according to it.

Taboada, Maite, et al. [12] uses a dictionary of words having a corresponding negative and positive score, here each word is a lexicon making it a lexicon based approach. Now thethings deployed here is the use of POS tagging for the analysis process for the application. POS tagging helps to clear out words with more importance. Adjectives are sentiment words in the text, the sentiment orientation of the text can be determined by the adjectives present in the text. Like all the lexicon approaches the words have a negative and positive score, extracting the score by looking on to the adjectives extracted from the text, the sentiment tilt can be determined. Adjectives represent the sentiment in them, so the presence of more negative sentiment will make the text negative inversely the presence of more positive adjectives will make the text positive in terms of sentiment orientation.

Encountering a collection of negative and positive adjectives, then the sentiment orientation it determinedby the strong adjective, a strong positive adjective score will shadow the remaining multiple weak negative adjective score since the resultant score will be positive. Taking adjectives alone into consideration can lead to incomplete sentiment score calculation therefore below are reviews of work done where more than one part of speech is taken into scrutiny.

Benamara, Farah, et al.[13] state that adverbs and adjectives together are crucial rather than just adjectives alone. The work above shows that adverbs are important part to find the sentiment direction as adjectives alone can be incomplete. According to [14]adverbs in the text mainly support the adjective sentiment. The degree of effectiveness of adjective on sentiment is determined by adverbs used for the adjective. Adverbs in other words intensifies the adjectives and the sentiment obtained from the adjectives. Here looking at some cases the idea of taking adverbs and adjective together will become clear to you • Case 1: String: “The taste of the bread is very bad” Here “bad” is adjective and score for bad in the lexicon will be negative, but looking at the string carefully you will find that the adjective alone cannot be sufficient. The above string has a adverb “very” which acts as a intensifier for the adjective. Here adverb tells the degree to which the string is negative. • Case 2: String: “The taste of the bread is not good.” Here again “good” is adjective which shows the positive sentiment, but the actual sense of the string is negative, which can be achieved by taking adverb and adjective all together. Here adverb is “not” which is of negative score, this will ultimately shift the text toward negative direction. If considered adjectives alone the resultant sentiment score would have been incorrect. The above cases clearly explains why using collection of adverbs and verbs is of great importance.

Moving on to the other framework developed having more new directions to find accurate sentiment. V. S. Subrahmanian and D. Reforgiato[14] andNasukawa, Tetsuya, and Jeonghee Yi [15] discusses about a new factor for sentiment analysis and i.e. deploying a feature extraction unit to extract the features from the text, now feature here means is the action or characteristics for which the text possess some sentiment. Abbasi et al. [16] the orientation of a particular feature in terms of polarity is crucial, after all the overall location of polarity in text depends upon what has been said?about what feature? The same approach has been tried in Tan, Songbo, and Jin Zhang[17] andFeldman, Ronen [18]. Feature extraction is important because the orientation of feature will tell us the final polarity to be generated after attaching a feature with correspondingadjective. When performing analysis on a domain specific scenario every feature has a different impact on the overall polarity of the subject for which the review has been issued. Every feature acts different for every object. This can be explained like when talking about a object say a mobile phone, now for feature “Battery life” and “Computation time” the sentiment word will have different effect on the object, if both the feature are assigned adjective “long”, which in lexicon dictionary is positive but when computed keeping the feature in mind the sentiment is not what expected of the adjective. For “battery life” the adjective will behave positively because it is positive feature and more the value of adjective for this feature more is the final sentiment towards positive location but for “Computation time” adjective will have a negative impact on the final sentiment of the text because it will degrade the quality of our objects feature. This is hidden semantics in our text which could not have been identified if feature extraction was not considered. Features for any object will be the verbs associated with it. InTheresa Wilson [22] the feature are extracted from the twitter tweets by hashtags. Hashtags act as the feature about the text been tweeted. Now having discussed theimportance of feature in sentiment analysis but extracting feature is still a challenge which has been done by Yi et al.[19] and Arab Salem et al.[20].

Extracting feature can be easily performed using the natural language processing approach. A natural language processor requires a tree structure training data to identify words and their corresponding POS. InSantorini, Beatrice [21] application of POS tagging is shown, the authors used Penn Treebank which is used as training data for the NLP processor.

The review above desquamate the work done in the desirable direction, extension in techniques and approaches provides accuracy to the analyzer. The work done in Patrick Paroubek et al.[23] the framework for analyzing the microblogging data. Today the percentage of active internet user is much more than before, this mass of active users frequently post their reviews on products. This review data has to be analyzed for opinion mining of the text, reason to take text as our input is because sentiment analysis comes under text mining, and the approaches defined are suitable for text only. 2.3.2 Work Relatedto Different Levels

According to authors ofLiu, Bing [24] the sentiment analysis of the corpus can be done in multiple approaches, the approach here is dividing the process into different levels. The classification of process by levels can be determined by the size of test text. The different levels of are • Document Level: As the names says document level is when the size of the text under analysis is of multiple lines. The document analysis is done to determine the overall sentiment of whole document, document level sentiment analysis gives the sentiment of the document in a shared format. The result will have the measurement of both positive and negative polarity, the final result will depict that the document is x percent positive and y percent negative. The constraints of using document level sentiment analysis is that the document should have same context of target. Authors ofZhang et al. [25] propose a framework for analyzing sentiment for Chinese text. There is was no other framework for analyzing text of language other than English. Here the text is taken and distributed into sentences and every sentence is converted to English and then reassembling as a document for sentiment analysis. This approach is reviewed by Feldman et al.[26] which lectures about various sentiment analysis techniques.

The framework described in Yessenalina et al.[27] analyses movie reviews, and analyzing them at document level. It analyzes debates also and show quite desirable results in terms of accuracy, for document level analysis the above work deploys a module for extracting hidden meaning from the sentence which will eventually make the work of analyzer less hectic as for now the whole document will not be parsed. The extraction is done by taking some sentence from the document with more relevance. • Sentence or phrase level sentiment analysis : Phrase level sentiment analysis is done on sentences or phrases extracted from the whole corpus or the document. A phrase level analyzer computes the sentiment of a particular sentence and returns it. Advantage of using sentence level analysis is that it doesn’t need to arrange the documents respective to their context, a sentence can individually have a sentiment about a context and since every sentence is taken separately the context can betaken into consideration as now there will be no mixture of contexts as in case of whole document analysis.

InWilson et al. [28] the exact framework for performing sentiment analysis at sentence or phrase level is presented, the framework here has two phases for analysis both in order respectively. First phase of the process is to scrutinize every sentence or phrase for polarity detection if the sentence possess any polarity other than neutral it is sent to next phase for analysis. The next phase then calculates the intensity of polarity i.e. knowing that the sentence has some sentiment but how strong or how weak is the sentiment, that is determined by the analyzer using a sentiment lexicon having negative and positive words labelled accordingly for the use of analysis, making it a supervised analysis.

InArun et al. [29] the sentence level analysis is observed from a different perspective. According to the study done by authors of above work is that every sentence can still be modularized, it can have many more part which may be dependent on each other and together they may not be able to express the sentiment correctly. They deploy a new concept of finding conjunction in the sentence and breaking up the sentence intomany parts and analyzing every part individually for sentiment analysis. Breaking the sentences further more can give rise to the risk to lose the semantics of the sentence, therefore to preserve the semantics they have converted the sentence into tree structure representation, as in tree representation the rules remain unchanged of the sentences semantics also the analysis is done at each hierarchical manner making the breaking of sentence and computing the sentiment very easy. They achieved 80 percent accuracy by deploying this framework. The framework is linguistically correct as it has been tested against the WordNet bag of words representation.

AuthorsWilson et al.[30] have worked down a framework for analyzing phrase level text for sentiment orientation. As told by the authors a sentence can have multiple context towards many directions, some direction may not be of use to the analysis process. Some words which are positive may imply negative meaning for a feature and a feature is associated with a context. The type of context to look out for in sentence level sentiment analysis is relevant and irrelevant context, the relevant context is further divided into two types i.e. prior polarity and contextual polarity. Prior polarity is necessary to determine as the actual contextual polarity depends upon the result of prior one. The result of the above study shows that the neutral sentiment present in the text will impair the quality of the features. Making it again a step towards more accurate lexicon base sentence level sentiment analysis.

Kaji et al. [31] plan to build the lexicon by themselves and not using pre-built lexicons and dictionaries, making your own lexicon could turn out to be a time and effort draining process none also the lexicon made should be able to cover all the work. In a way the lexicon should be able to envelope all the words in the text. To build a lexicon of such quality the authors developed a framework for extracting sentiment semantics and rules, when arranging these rules and clues again will result a complete document with sentiment hence using this approach can be helpful, similar to other this work also suggest using sentence level sentiment analysis for the process. The other unique feature about the above work is that it proposes to analyze text in Japanese language. To prepare a corpus for extracting clues and sentiment feature the framework uses large repository of web pages having all relevant information about the topic. The need to using such a technique is that it saves you time and effort when building your own lexicon. Also the clues are extracted out which makes it easierfor the recognizer to recall data. This again helps us to achieve a very high accuracy, which is the sole purpose of extending work in this field of interest. Also the work above helps us to determine the objectivity of the text, objectivity can be tricky at times to determine, a work canact objective and also subjective when under some influence of the verb performed by the subject. • Aspect or feature level analysis : Aspect level analysis is when one don’t care about the size of text taken or the belonging of the text but the summary of the text in terms of entities, object, aspect etc. which are the components to target while writing a review of the subject. Imagine a collection of vectors which forms a document, since vectors are made up of direction and value, here direction being the polarity of the vector and values being the effectiveness of the sentiment. Collection of vectors can be seen as collection of words and collection of words can be seen as a document or sentence. As explained above here the size of collection of vector doesn’t matter a lot what matters is the vectors which actually means something and what effect these vector have on final sentiment quantity when the process is completed. Aspect or entity is the target for which the sentiment was released, an aspect can be of negative effect also and of positive effect anything which doesn’t have any relevance with the aspect should be ignored in aspect level sentiment analysis. Below are some work done under this category of analysis.

Jochen et al. [32] suggest how feature extraction actually looks like. The summary of the following work is to demonstrate the use of feature extraction and the application of feature extraction and desirable effects of it. As seen in the above titled topic the authors exhibits the advantages of extracting feature or entities from the text, since sentiment analysis is a field under text mining therefore to parse extract the aspects to survey some work done using text mining approaches as done in above work. This work was done by researcher at IBM, it main aim of the miner here was to extract out the important aspects of the document. The vision here is to make people see what they usually miss in the document while going through it. The framework is successful to reduce small aspects from a mountain of text data. The data here can be Emails, insurance\policy claims document\contracts, complaints forms from customers, content of rivals etc. The work is acknowledged in treating all the above cases, the vision here was to shift the work in document less environment where reading every document was not required instead you can read the document for feature extraction and see the important matter of interest discussed, trapped information in forms, email etc. can be read.

The work done by Feldman et al. [33] is somewhat same, they just took fully loaded database instead of emails and forms. They call this “Knowledge Discovery in Databases”, this is also a kind approach as it takes large amount of data for analysis, and the size of data they took for testing their framework is 52,000 documents. Arranging the terms in hierarchical manner is also, meaning most important terms and entities at top then less important below it.

Cohen et al. [34] discovered a crisis is application related to bioinformatics, a framework for analyzing textual reports, auxiliary lab reports, records etc. for information summarization, an application for analyzing these kinds of data and generating some kind of output as interpreted from reading the text. Here now easily the communication gap between the disciplines for bioinformatics can be reduced. The reports of specimens, experiment are to be read completely for understanding the summary, the framework here using aspect level analysis which again reduces the traversing of whole document and gives the output in forms of feature and aspects which are comparably much easy to handle. They used MEDLINE database which contain 24 million records, quite a large database to handle and this framework analyses this database for feature estimation.

Michael et al. [35]the authors performed aspect level analysis and for this it converts all the text in form of features and entities, to deal with vectors an algorithm is required which can evaluate using vector values. Vector can be analyzed using a algorithm which can separate unsupported vectors and keep only supported ones, for which the authors used Support Vector Machine, all you need to do is train the SVM classifier can the classified data will be the output of the classifier. The analysis done here can be of high quality yielding, the classifier can classify the text even on the hidden language or linguistic constraints which are usually not encountered by any analyzer. Here to test the framework the t took the feedbacks they receive from online forums of companies where the text can contains many twisted turn and points hence using SVM for classification unburdens the work to a lot extent, this work also falls under the dimension reduction domain because SVM in general performs dimension reduction.

Brendan et al. [36] performed sentiment analysis on the twitter data, the featured data used was the election comments during the presidential election in United States during the year 2008-09, they tried to predict the tilt of the public polls for any leader using twitter tweets. The approach was totally dependent upon correlation of tweets for leaders, according to the abstract of this work the authors tried to substitute the existing poll predictor with this automated approach. The system was build using the text mining principles and methods. The feature extraction was used to determine the demands the people were showing from the tweets. Tags such as “jobs”, “economy” etc. were for the consumer pushover. Handling a political debate or election release data can be challenging as the level of complexity is different from that of feedback as the target features can be predicted in them but predicting them in here can be challenging and time consuming. The opinion mining from this kind of domain can return features which cannot be predicted before and shall possess a great threat.

Parrytomar (talk)07:48, 14 May 2017