程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

BERTopic RuntimeError:预期所有张量都在同一设备上,但发现至少两个设备,cpu 和 cuda:0,详情见帖子

发布于2024-11-10 09:27     阅读(976)     评论(0)     点赞(14)     收藏(2)


我正在尝试使用 BERTopic 库和使用 transformers 库的自定义文本生成模型。但是,我遇到了这个 RuntimeError。我尝试在管道中将设备指定为 0(GPU),但仍然出现此错误。我该如何解决这个问题?

请帮助我了解导致此错误的原因以及如何修复它。

2024-08-30 10:44:11,684 - BERTopic - Dimensionality - Completed ✓
2024-08-30 10:44:11,688 - BERTopic - Cluster - Start clustering the reduced embeddings
/usr/local/lib/python3.10/dist-packages/joblib/externals/loky/backend/fork_exec.py:38: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  pid = os.fork()
2024-08-30 10:44:17,485 - BERTopic - Cluster - Completed ✓
2024-08-30 10:44:17,498 - BERTopic - Representation - Extracting topics from clusters using representation models.
  0%|          | 0/66 [00:08<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-11-5511f129a54a>](https://localhost:8080/#) in <cell line: 16>()
     14 )
     15 
---> 16 topics, probs = topic_model.fit_transform(docs, embeddings)

13 frames
[/usr/local/lib/python3.10/dist-packages/transformers/generation/logits_process.py](https://localhost:8080/#) in __call__(self, input_ids, scores)
    351     @add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
    352     def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
--> 353         score = torch.gather(scores, 1, input_ids)
    354 
    355         # if score < 0 then repetition penalty has to be multiplied to reduce the token probabilities

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA_gather)

我的代码是:

from transformers import AutoTokenizer, pipeline

model = AutoModelForCausalLM.from_pretrained(
    "TheBloke/zephyr-7B-alpha-GGUF",
    model_file="zephyr-7b-alpha.Q4_K_M.gguf",
    model_type="mistral",
    gpu_layers=50,
    hf=True
    #context_length=512, 
    #max_new_tokens=512
)

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")

prompt = """<|system|>You are a helpful, respectful and honest assistant for labeling topics..</s>
<|user|>
I have a topic that contains the following documents:
[DOCUMENTS]

The topic is described by the following keywords: '[KEYWORDS]'."""

generator = pipeline(
    model=model, tokenizer=tokenizer,
    task='text-generation',
    max_new_tokens=50,
    repetition_penalty=1.1,
    device=0
)

from bertopic.representation import TextGeneration

zephyr = TextGeneration(generator, prompt=prompt, doc_length=10,tokenizer="char")
representation_model = {"Zephyr": zephyr}

解决方案


暂无回答



所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/2045412/ead362a956b51089ed81/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

14 0
收藏该文
已收藏

评论内容:(最多支持255个字符)