Advertisement

构建生物医学知识图谱from zero to hero (2):文献抽取

阅读量:

从某文献中提取内容时(或:在某文献中),我们对文献的PDF格式文件进行图像化处理,并利用pytesseract技术实现对图像的文字信息提取。

复制代码
    import requests
    import pdf2image
    import pytesseract
    
    pdf = requests.get('https://arxiv.org/pdf/2110.03526.pdf')
    doc = pdf2image.convert_from_bytes(pdf.content)
    
    # Get the article text
    article = []
    for page_number, page_data in enumerate(doc):
    txt = pytesseract.image_to_string(page_data).encode("utf-8")
    # Sixth page are only references
    if page_number < 6:
      article.append(txt.decode("utf-8"))
    article_txt = " ".join(article)
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

Mohammadreza Ahmadi conducted research on tissue engineering techniques focused on regenerating skin, hair follicles, and other structures derived from stem cells. This study aimed to address challenges in treating various skin conditions such as chronic wounds or diabetic ulcers. Additionally, the medical field sought methods for both aesthetic rejuvenation (cosmetic purposes) and reconstructive medicine. Furthermore, reconstructive medicine employed this approach by delivering pluripotent stem cells directly to target tissues.

接下来对文本进行处理

复制代码
    import nltk
    nltk.download('punkt')
    
    def clean_text(text):
      """Remove section titles and figure descriptions from text"""
      clean = "\n".join([row for row in text.split("\n") if (len(row.split(" "))) > 3 and not (row.startswith("(a)"))
                    and not row.startswith("Figure")])
      return clean
    
    text = article_txt.split("INTRODUCTION")[1]
    ctext = clean_text(text)
    sentences = nltk.tokenize.sent_tokenize(ctext)
    
    
      
      
      
      
      
      
      
      
      
      
      
      
    
    AI助手

Numerous individuals suffering from skin disorders, including chronic wounds, persistent ulcers, and diabetic ulcers, necessitated the reconstruction and regeneration of their skin surfaces. In other words, the medical field was increasingly seeking effective methods to rejuvenate and restore skin for both aesthetic reasons and therapeutic needs. This demand extended beyond the patient population to include healthy individuals as well.

全部评论 (0)

还没有任何评论哟~