no non-missing arguments to min on vector with text

1 year ago

#350259

Adrian De Jesus

Hey yall ill keep it short and simple.

im following along with this earnings call sentiment analysis.

https://jagg19.github.io/2019/04/sentiment-analysis-conf-call/

I've had issues with packages but manage to pushed to down to cleaning.

I'm currently stuck. The funny part is it was working fine yesterday but I was having rjava/Qdap issues that I fixed. Now my code isn't working like it was yesterday

Warning message: In min(which(!str_detect(transcript_text, "[[:upper:]][\w]+ -"))) : no non-missing arguments to min; returning Inf

I repeat I did not have this issue yesterday before i fixed qdap/java. help please.

I checked that all the packages worked. Changed the vector value to see if it work and it does. checked that text was saved into transcript_text .

library(rJava)
library(pdftools)
library(dplyr)
library(stringr)
library(tm)
library(SentimentAnalysis)
library(tidyverse)
library(tidyr)
library(xml2)
library(rvest)
library(devtools)
library(stringi)
library(rex)
library(magrittr)
library(rebus)
library(kableExtra)
library(qdap)
library(textclean)
library(rjava)
library(SnowballC)
library(ggplot2)


Company Info
company_name <- "Advansix"
ticker <- "ASIX"

#Transcript URLs

Q4<-"https://seekingalpha.com/article/4488443-advansixs-asix-ceo-erin-kane-on-q4-2021-results-earnings-call-transcript"

##Reading the body of the html, and converting it to a readable text format 

    html1<- read_html(Q4)%>%
      html_nodes("body")%>%
      html_text()
    
    
    
    transcript_text <- html1
    transcript_text
    #Seperating the text by new line characters in html code
    transcript_text <- strsplit(transcript_text, "\n") %>% unlist()
    #Remove empty lines
    transcript_text <- transcript_text[!stri_isempty(transcript_text)]
    
    earnings_date <- html_text(html_nodes(transcript_text, "date")) %>% paste0(collapse = "")
    
    #x <- c(transcript_text, 'Adam')
    #x
    
    
    
    #Create pattern to grab relevant names such as Analyst and Executives.
    pattern1 <- capture(upper() %R% one_or_more(WRD) %R% SPC %R%
                          upper() %R% one_or_more(WRD)) %R% " - " %R% capture(one_or_more(WRD) %R%
                                                                                optional(char_class("- ,")) %R% zero_or_more(WRD %R% SPC %R% WRD %R% "-" %R% WRD))
    
    #Give the names all common seperators
    transcript_text <- gsub("–","-",transcript_text)
    
    #REGEX pattern to search for the starting index containing executive names. Finds something
    idx_e <- min(which(str_detect(transcript_text, "[[:upper:]][\\w]+ -")))
    
    #Dropping everything before the start of Executive names, and resetting the index back to 1
    transcript_text <- transcript_text[idx_e:length(transcript_text)]
    idx_e <- 1
    idx_e
    #Repeating to find the starting index for the analyst names
    idx_a <- min(which(!str_detect(transcript_text, "[[:upper:]][\\w]+ -")))
    
    idx_a
    
    #Executive names will start from the starting index, idx_e, to 1 row before the analysts st
    exec <- transcript_text[idx_e:(idx_a+1)]
    
    exec <- str_match(exec[3], pattern1)
    exec <- exec[1:nrow(exec),2]
    exec
    
    
    #Repeating to find the starting index for the analyst names
    idx_a <- min(which(str_detect(transcript_text, "[[:upper:]][\\w]+ -")))
    idx_a

data-cleaning

stringr

rjava

inf

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs