Only Coders - Where knowledge meets opportunity

python (65.2k questions)

javascript (44.3k questions)

reactjs (22.7k questions)

java (20.8k questions)

c# (17.4k questions)

html (16.3k questions)

r (13.7k questions)

android (13k questions)

Questions - sentence-transformers

Huggingface pretrained model's tokenizer and model objects have different maximum input length

I'm using symanto/sn-xlm-roberta-base-snli-mnli-anli-xnli pretrained model from huggingface. My task requires to use it on pretty large texts, so it's essential to know maximum input length. The follo...

Nick Zorander

nlp

huggingface-transformers

huggingface-tokenizers

sentence-transformers

Votes: 0

Answers: 2

Latest Answer

Since you are using a SentenceTransformer and load it to the SentenceTransformer class, it will truncate your input at 128 tokens as stated by the documentation (the relevant code is here): property ...

cronoik

Is it possible to show the specific content which was similar when doing paragraph similarity?

I'm trying to create a paragraph similarity checker using Python. I'm using Sentence Transformers along with the "All the News 2" dataset which contains over 2 million articles. I have alrea...

denji

python

nlp

sentence-similarity

sentence-transformers

Votes: 0

Answers: 0

How to know if a word belong to a Transformer model?

I use the python library sentence_transformers with the models RoBERTa and FlauBERT. I use cosine scores to compute similarity but for some words it doesn't work well. Those words seems to be the one ...

Nathan Redin

python

nlp

bert-language-model

transformer-model

sentence-transformers

Votes: 0

Answers: 1

Latest Answer

For RoBERTa and FlauBERT models, you can use get_vocab() method to get a dictionary with the tokens and theirs ids. Example of 100 tokens in vocab: from transformers import RobertaTokenizer tokenizer ...

Victor Maricato

How to use metadata for document retrieval using Sentence Transformers?

I'm trying to use Sentence Transformers and Haystack for document retrieval, focusing on searching documents on other metadata beside document text. I'm using a dataset of academic publication titles,...

Ellio

python

nlp

information-retrieval

sentence-transformers

Votes: 0

Answers: 1

Latest Answer

It sounds like you need metadata filtering rather than placing the year within the query itself. The FaissDocumentStore doesn't support filtering, I'd recommend switching to the PineconeDocumentStore ...

James Briggs

Posts

Questions

Blogs

Jobs

Questions about sentence-transformers

Read more about sentence-transformers