Advertisement

从0->1构建知识图谱练习(KG,Knowledge Graph)

阅读量:

参考资料:[Knowledge Graph: Data Science Methods for Extracting Information from Text (with Python implementation)](https://www.analyticsvidya.com/blog/2019/10/how-to-build-knowledge-graph-text-using-spacy/ "Knowledge Graph: Data Science Methods for Extracting Information from Text with Python implementation)

链接需要挂梯子。

一篇写的比较容易理解的文章,根据作者的思路和展示能实现成功。

数据集地址:

wiki_sentences_v2.csv

在代码实现过程中需注意所使用的IDE版本以及相关库的版本配置。特别地,在spacy库方面,请注意新版与旧版之间的设置差异。附上我使用的版本:Python 3.8.19

为了构建新的工作环境,请参考以下文件内容并将其写入requirements.txt文件,并通过以下命令在终端上执行:使用pip安装 -r requirements.txt。

复制代码
 Python Version: 3.8.19

    
  
    
 absl-py==2.0.0
    
 accelerate==0.23.0
    
 aiofiles==23.2.1
    
 aiohttp==3.8.6
    
 aiosignal==1.3.1
    
 aliyun-python-sdk-core==2.14.0
    
 aliyun-python-sdk-kms==2.16.2
    
 altair==5.1.2
    
 annotated-types==0.6.0
    
 anyio==3.7.1
    
 asgiref==3.7.2
    
 astor==0.8.1
    
 async-timeout==4.0.3
    
 attrdict==2.0.1
    
 attrs==23.1.0
    
 Babel==2.13.1
    
 backports.zoneinfo==0.2.1
    
 bce-python-sdk==0.8.95
    
 beautifulsoup4==4.12.2
    
 blinker==1.6.3
    
 blis==0.7.11
    
 boto3==1.28.82
    
 botocore==1.31.82
    
 bottle==0.12.25
    
 cachetools==5.3.1
    
 catalogue==2.0.10
    
 certifi==2023.7.22
    
 cffi==1.16.0
    
 charset-normalizer==3.3.0
    
 click==8.1.7
    
 cloudpathlib==0.16.0
    
 colorama==0.4.6
    
 common==0.1.2
    
 confection==0.1.3
    
 ConfigArgParse==1.7
    
 contourpy==1.1.1
    
 cpm-kernels==1.0.11
    
 crcmod==1.7
    
 cryptography==41.0.7
    
 cssselect==1.2.0
    
 cssutils==2.9.0
    
 ctranslate2==3.20.0
    
 cycler==0.12.1
    
 cymem==2.0.8
    
 Cython==3.0.5
    
 data==0.4
    
 datasets==2.19.0
    
 decorator==4.4.2
    
 dill==0.3.7
    
 docopt==0.6.2
    
 dual==0.0.10
    
 dynamo3==0.4.10
    
 easydict==1.11
    
 en-core-web-sm==3.7.1
    
 et-xmlfile==1.1.0
    
 evaluate==0.4.1
    
 exceptiongroup==1.1.3
    
 faiss-cpu==1.7.1.post2
    
 fastapi==0.103.2
    
 fasttext-wheel==0.9.2
    
 ffmpy==0.3.1
    
 filelock==3.12.4
    
 fire==0.5.0
    
 Flask==3.0.0
    
 flask-babel==4.0.0
    
 flatbuffers==23.5.26
    
 flywheel==0.5.4
    
 fonttools==4.43.1
    
 frozenlist==1.4.0
    
 fsspec==2023.6.0
    
 funcsigs==1.0.2
    
 future==0.18.3
    
 gast==0.3.3
    
 gitdb==4.0.10
    
 GitPython==3.1.37
    
 google-auth==2.23.4
    
 google-auth-oauthlib==1.0.0
    
 gradio==3.47.1
    
 gradio_client==0.6.0
    
 grpcio==1.59.2
    
 h11==0.14.0
    
 httpcore==0.18.0
    
 httpx==0.25.0
    
 huggingface-cli==0.1
    
 huggingface-hub==0.22.2
    
 icetk==0.0.4
    
 idna==3.4
    
 imageio==2.32.0
    
 imbalanced-learn==0.12.2
    
 imgaug==0.4.0
    
 importlib-metadata==6.8.0
    
 importlib-resources==6.1.0
    
 iopath==0.1.10
    
 itsdangerous==2.1.2
    
 jieba==0.42.1
    
 Jinja2==3.1.2
    
 jmespath==0.10.0
    
 joblib==1.3.2
    
 jsonify==0.5
    
 jsonschema==4.19.1
    
 jsonschema-specifications==2023.7.1
    
 kiwisolver==1.4.5
    
 langcodes==3.3.0
    
 latex2mathml==3.75.2
    
 layoutparser==0.3.4
    
 Levenshtein==0.23.0
    
 libaio==0.9.1
    
 llvmlite==0.41.1
    
 lmdb==1.4.1
    
 lxml==4.9.3
    
 Markdown==3.5
    
 markdown-it-py==3.0.0
    
 MarkupSafe==2.1.3
    
 matplotlib==3.7.3
    
 mdtex2html==1.2.0
    
 mdurl==0.1.2
    
 mpmath==1.3.0
    
 multidict==6.0.4
    
 multiprocess==0.70.15
    
 murmurhash==1.0.10
    
 networkx==3.1
    
 nltk==3.8.1
    
 numba==0.58.1
    
 numpy==1.21.0
    
 oauthlib==3.2.2
    
 onnxruntime==1.10.0
    
 opencv-contrib-python==4.2.0.32
    
 opencv-python==4.6.0.66
    
 OpenNMT-py==2.3.0
    
 openpyxl==3.1.2
    
 openxlab==0.0.29
    
 opt-einsum==3.3.0
    
 orjson==3.9.7
    
 oss2==2.17.0
    
 packaging==23.2
    
 paddle==1.0.2
    
 paddle-bfloat==0.1.7
    
 paddleclas==2.5.1
    
 paddleocr==2.7.0.3
    
 paddlepaddle==2.4.1
    
 pandas==2.0.3
    
 pdf2docx==0.5.5
    
 pdf2image==1.17.0
    
 pdfminer.six==20231228
    
 pdfplumber==0.11.0
    
 peewee==3.17.0
    
 peft==0.5.0
    
 Pillow==10.0.0
    
 pip==23.3.1
    
 pipreqs==0.4.13
    
 pkgutil_resolve_name==1.3.10
    
 portalocker==2.8.2
    
 premailer==3.10.0
    
 preshed==3.0.9
    
 prettytable==3.9.0
    
 protobuf==3.20.0
    
 prox==0.0.17
    
 psutil==5.9.5
    
 pyahocorasick==2.0.0
    
 pyarrow==13.0.0
    
 pyarrow-hotfix==0.6
    
 pyasn1==0.5.0
    
 pyasn1-modules==0.3.0
    
 pybind11==2.11.1
    
 pyclipper==1.3.0.post5
    
 pycparser==2.21
    
 pycryptodome==3.19.0
    
 pydantic==2.4.2
    
 pydantic_core==2.10.1
    
 pydeck==0.8.1b0
    
 pydub==0.25.1
    
 Pygments==2.16.1
    
 PyMuPDF==1.20.2
    
 PyMuPDFb==1.23.6
    
 pynndescent==0.5.12
    
 pyonmttok==1.37.1
    
 pyparsing==3.1.1
    
 pypdfium2==4.29.0
    
 PySocks==1.7.1
    
 python-dateutil==2.8.2
    
 python-docx==1.1.0
    
 python-geoip-python3==1.3
    
 python-Levenshtein==0.23.0
    
 python-multipart==0.0.6
    
 pytz==2023.3.post1
    
 PyWavelets==1.4.1
    
 pywin32==306
    
 PyYAML==6.0.1
    
 rapidfuzz==3.5.2
    
 rarfile==4.1
    
 referencing==0.30.2
    
 regex==2023.10.3
    
 requests==2.28.2
    
 requests-oauthlib==1.3.1
    
 responses==0.18.0
    
 rich==13.4.2
    
 rouge-chinese==1.0.3
    
 rpds-py==0.10.4
    
 rsa==4.9
    
 s3transfer==0.7.0
    
 sacrebleu==2.3.1
    
 safetensors==0.4.3
    
 scikit-image==0.17.2
    
 scikit-learn==1.3.2
    
 scipy==1.10.1
    
 semantic-version==2.10.0
    
 sentencepiece==0.1.95
    
 setuptools==60.2.0
    
 shapely==2.0.2
    
 six==1.16.0
    
 smart-open==6.4.0
    
 smmap==5.0.1
    
 sniffio==1.3.0
    
 soupsieve==2.5
    
 spacy==3.7.2
    
 spacy-legacy==3.0.12
    
 spacy-loggers==1.0.5
    
 sqlparse==0.4.4
    
 srsly==2.4.8
    
 sse-starlette==1.6.5
    
 starlette==0.27.0
    
 streamlit==1.27.2
    
 sympy==1.12
    
 tabulate==0.9.0
    
 tenacity==8.2.3
    
 tensorboard==2.14.0
    
 tensorboard-data-server==0.7.2
    
 termcolor==2.3.0
    
 thinc==8.2.1
    
 threadpoolctl==3.2.0
    
 tifffile==2023.7.10
    
 tight==0.1.0
    
 tokenizers==0.13.3
    
 toml==0.10.2
    
 toolz==0.12.0
    
 torch==2.1.0+cu121
    
 torchaudio==2.1.0
    
 torchtext==0.5.0
    
 torchvision==0.16.0
    
 tornado==6.3.3
    
 tqdm==4.65.2
    
 transformers==4.26.1
    
 typer==0.9.0
    
 typing_extensions==4.8.0
    
 tzdata==2023.3
    
 tzlocal==5.1
    
 ujson==5.8.0
    
 umap==0.1.1
    
 umap-learn==0.5.6
    
 urllib3==1.26.18
    
 uvicorn==0.23.2
    
 validators==0.22.0
    
 visualdl==2.5.3
    
 waitress==2.1.2
    
 wasabi==1.1.2
    
 watchdog==3.0.0
    
 wcwidth==0.2.9
    
 weasel==0.3.3
    
 websockets==11.0.3
    
 Werkzeug==3.0.1
    
 wheel==0.41.2
    
 xxhash==3.4.1
    
 yarg==0.1.9
    
 yarl==1.9.2
    
 zipp==3.17.0
    
    
    
    
    bash
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-18/T43rnFYfwdC9aQblo7VRx2SGPmDk.png)

还需要一个预训练的英文语言模型en_core_web_sm:

可以在终端直接pip install en_core_web_sm,模型版本要和spacy库对应。

或者下载模型到本地:

第三方一 offs单次 shots模型在core WebSphere mini上构建的GitHub存储库地址为github.com.

或使用顶部的资源。

然后终端命令:

复制代码
    pip install en_core_web_sm-2.3.0.tar.gz
    
    bash

可以测试代码查看:

复制代码
 import spacy

    
  
    
 nlp = spacy.load('en_core_web_sm')
    
  
    
 doc = nlp("The 22-year-old recently won ATP Challenger tournament.")
    
  
    
 for tok in doc:
    
     print(tok.text, "...", tok.dep_)
    
    
    
    
    python

从顶部文章链接提取代码时,请务必注意仔细查看文章内容中的具体细节部分,并特别关注其中包含的黑框和白框区域的相关代码信息。

有何使用体验和心得欢迎私信交流~

全部评论 (0)

还没有任何评论哟~