1 year ago
#350259
Adrian De Jesus
no non-missing arguments to min on vector with text
Hey yall ill keep it short and simple.
im following along with this earnings call sentiment analysis.
https://jagg19.github.io/2019/04/sentiment-analysis-conf-call/
I've had issues with packages but manage to pushed to down to cleaning.
I'm currently stuck. The funny part is it was working fine yesterday but I was having rjava/Qdap issues that I fixed. Now my code isn't working like it was yesterday
Warning message: In min(which(!str_detect(transcript_text, "[[:upper:]][\w]+ -"))) : no non-missing arguments to min; returning Inf
I repeat I did not have this issue yesterday before i fixed qdap/java. help please.
I checked that all the packages worked. Changed the vector value to see if it work and it does. checked that text was saved into transcript_text .
library(rJava)
library(pdftools)
library(dplyr)
library(stringr)
library(tm)
library(SentimentAnalysis)
library(tidyverse)
library(tidyr)
library(xml2)
library(rvest)
library(devtools)
library(stringi)
library(rex)
library(magrittr)
library(rebus)
library(kableExtra)
library(qdap)
library(textclean)
library(rjava)
library(SnowballC)
library(ggplot2)
Company Info
company_name <- "Advansix"
ticker <- "ASIX"
#Transcript URLs
Q4<-"https://seekingalpha.com/article/4488443-advansixs-asix-ceo-erin-kane-on-q4-2021-results-earnings-call-transcript"
##Reading the body of the html, and converting it to a readable text format
html1<- read_html(Q4)%>%
html_nodes("body")%>%
html_text()
transcript_text <- html1
transcript_text
#Seperating the text by new line characters in html code
transcript_text <- strsplit(transcript_text, "\n") %>% unlist()
#Remove empty lines
transcript_text <- transcript_text[!stri_isempty(transcript_text)]
earnings_date <- html_text(html_nodes(transcript_text, "date")) %>% paste0(collapse = "")
#x <- c(transcript_text, 'Adam')
#x
#Create pattern to grab relevant names such as Analyst and Executives.
pattern1 <- capture(upper() %R% one_or_more(WRD) %R% SPC %R%
upper() %R% one_or_more(WRD)) %R% " - " %R% capture(one_or_more(WRD) %R%
optional(char_class("- ,")) %R% zero_or_more(WRD %R% SPC %R% WRD %R% "-" %R% WRD))
#Give the names all common seperators
transcript_text <- gsub("–","-",transcript_text)
#REGEX pattern to search for the starting index containing executive names. Finds something
idx_e <- min(which(str_detect(transcript_text, "[[:upper:]][\\w]+ -")))
#Dropping everything before the start of Executive names, and resetting the index back to 1
transcript_text <- transcript_text[idx_e:length(transcript_text)]
idx_e <- 1
idx_e
#Repeating to find the starting index for the analyst names
idx_a <- min(which(!str_detect(transcript_text, "[[:upper:]][\\w]+ -")))
idx_a
#Executive names will start from the starting index, idx_e, to 1 row before the analysts st
exec <- transcript_text[idx_e:(idx_a+1)]
exec <- str_match(exec[3], pattern1)
exec <- exec[1:nrow(exec),2]
exec
#Repeating to find the starting index for the analyst names
idx_a <- min(which(str_detect(transcript_text, "[[:upper:]][\\w]+ -")))
idx_a
r
data-cleaning
stringr
rjava
inf
0 Answers
Your Answer