Computer Science Homework Help
Please answer this homework related to Data Science and Big Data Analysis in APA format with References and Citations “Tidy Text Format” – Question 1 a)You have been assigned “your author”* in: –ITS83
Please answer this homework related to Data Science and Big Data Analysis in APA format with References and Citations
“Tidy Text Format” – Question 1
a)You have been assigned “your author”* in: –ITS836-46_Week12 Authors for Text Analysis.xlsx b)Identify books for the author: www.gutenberg.org http://www.gutenberg.org/browse/authors/a c)Compare word frequencies as in Figure 1.3
of Jane Austen, the Brontë sisters, and “your author”
*You can chose another author: https://www.gutenberg.org/browse/scores/top
Make sure it is not on the list for anyone else
“Sentiment analysis with tidy data” Question 2
a)Analyze the sentiment through multiple works (minimum 2) belonging to “your author’” as Fig 2.2 b)Comparing three sentiment lexicons through the sentiment lexicons as Fig 2.3 –AFINN from Finn Årup Nielsen, –bing from Bing Liu and collaborators, and –nrc from Saif Mohammad and Peter Turney. c)Plot words that contribute to positive and negative sentiment for your authors works as in Fig 2.4 d)Create a world cloud of the most common words for your author’s works as in Fig 2.5
“Analyzing word and document frequency: tf-idf” Question 3
a)Analyze TF distribution in your author’s works as in Fig 3.1 b)Plot Zipf’s law for your author’s works as in Fig 3.2 c)Plot highest tf-idf words in each of you author’s works as in Fig 3.4
“n-grams and correlations” Question 4
a)Plot the bigrams with the highest tf-idf from each of your author’s works as in Fig 4.1 b)Plot the words preceded by ‘not’ that had the greatest contribution to sentiment scores, in either a positive or negative direction of your author’s works as in Fig 4.2 c)Plot common bigrams in your author’s works as in Fig 4.4
“to and from non-tidy formats” Question 5
•As explained in Section5.2 cast the tidy text data for one of your author’s works into a matrix
“Topic modeling” Question 6
a)For your author’s works create a topic model with the terms that are most common within each topic using the LDA method as in Fig 6.4