9. CentOS 7에서 TensorFlow GPU를 설치

페이지 정보

작성자 관리자 댓글 0건 조회 2,624회 작성일 22-07-08 20:35

본문

9. CentOS 7에서 TensorFlow GPU를 설치

1. 리눅스 드라이버 설치하기

참고사이트 :

Cent OS 7.x Linux Nvidia 그래픽 드라이버 설치

2. NVIDIA Driver 설치

TensorFlow 공식 홈페이지를 보면 TensorFlow GPU를 사용하기 위해 NVIDIA GPU Driver, CUDA Toolkit, cuDNN SDK가 필요하다고 안내하고 있습니다.

NVIDIA GPU Driver는 RPM 패키지를 이용해 설치하고, CUDA Toolkit과 cuDNN SDK은 NVIDIA Container Toolkit을 활용해 설치합니다.

모든 설정이 끝나면 아래 그림과 같은 아키텍처가 됩니다.

NVIDIA 공식 홈페이지에 나와 있는 설치 방법을 참고하여 진행합니다.

gcc등 의존성 Library 설치

# yum update

# yum install epel-release

# yum groups mark install 'Development Tools'

# yum install dkms yum-utils pciutils

CUDA-Capable GPU 장비 설치 확인

# lspci | grep -i nvidia

[root@localhost ~]# lspci | grep -i nvidia

02:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)

02:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)

84:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)

84:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)

[root@localhost ~]#

Kernel Headers, Development 패키지 설치

# yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

새로 설치 된 커널적용을 위해 재부팅

# reboot

NVIDIA Driver 설치

# yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo

# yum clean all

# yum install nvidia-driver-latest-dkms

[root@localhost ~]# yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo

Loaded plugins: fastestmirror, langpacks

adding repo from: http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo

grabbing file http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo to /etc/yum.repos.d/cuda-rhel7.repo

repo saved to /etc/yum.repos.d/cuda-rhel7.repo

[root@localhost ~]# yum clean all

Loaded plugins: fastestmirror, langpacks

Cleaning repos: base cuda-rhel7-x86_64 epel extras updates

Cleaning up list of fastest mirrors

[root@localhost ~]# yum install nvidia-driver-latest-dkms -y

Loaded plugins: fastestmirror, langpacks

Determining fastest mirrors

epel/x86_64/metalink | 6.6 kB 00:00

* base: mirror.kakao.com

* epel: mirror.sabay.com.kh

* extras: mirror.kakao.com

* updates: mirror.kakao.com

Driver 적용을 위해 재부팅 및 설치확인

# reboot

# nvidia-smi

[root@localhost ~]# nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

[root@localhost ~]# dmesg |grep NVRM

[ 9.619564] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 515.48.07 Fri May 27 03:26:43 UTC 2022

[ 21.953927] NVRM: API mismatch: the client has the version 515.57, but

NVRM: this kernel module has the version 515.48.07. Please

NVRM: make sure that this kernel module and all NVIDIA driver

NVRM: components have the same version.

많은 구글링 끝에 쿠다의 문제가 아니라 엔비디아 드라이버 문제라고 결론 지었다.

NVIDIA 드라이버 재설치

[root@localhost ~]# sh NVIDIA-Linux-x86_64-515.57.run

Verifying archive integrity... OK

Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 515.57.....................................................................................................................................................................................................................................................

Driver 적용을 위해 재부팅 및 설치확인

# reboot

# nvidia-smi

[root@localhost ~]# lshw -numeric -C display

*-display

description: VGA compatible controller

product: GP102 [GeForce GTX 1080 Ti] [10DE:1B06]

vendor: NVIDIA Corporation [10DE]

physical id: 0

bus info: pci@0000:02:00.0

version: a1

width: 64 bits

clock: 33MHz

capabilities: pm msi pciexpress vga_controller bus_master cap_list rom

configuration: driver=nvidia latency=0

resources: iomemory:3bf0-3bef iomemory:3bf0-3bef irq:105 memory:91000000-91ffffff memory:3bfe0000000-3bfefffffff memory:3bff0000000-3bff1ffffff ioport:2000(size=128) memory:92080000-920fffff

*-display

description: VGA compatible controller

product: G200eR2 [102B:534]

vendor: Matrox Electronics Systems Ltd. [102B]

physical id: 0

bus info: pci@0000:09:00.0

version: 01

width: 32 bits

clock: 33MHz

capabilities: pm vga_controller bus_master cap_list rom

configuration: driver=mgag200 latency=64 maxlatency=32 mingnt=16

resources: irq:17 memory:90000000-90ffffff memory:93000000-93003fff memory:92800000-92ffffff

*-display

description: VGA compatible controller

product: GP102 [GeForce GTX 1080 Ti] [10DE:1B06]

vendor: NVIDIA Corporation [10DE]

physical id: 0

bus info: pci@0000:84:00.0

version: a1

width: 64 bits

clock: 33MHz

capabilities: pm msi pciexpress vga_controller bus_master cap_list rom

configuration: driver=nvidia latency=0

resources: iomemory:3ff0-3fef iomemory:3ff0-3fef irq:106 memory:c8000000-c8ffffff memory:3ffe0000000-3ffefffffff memory:3fff0000000-3fff1ffffff ioport:8000(size=128) memory:c9080000-c90fffff

[root@localhost ~]#

3. CUDA Toolkit & cuDNN 설치

Docker CE 설치

nvidia-docker를 사용하기 위해 Docker 19.03 버전 이상을 설치합니다.

Docker 설치

# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

# yum install docker-ce docker-ce-cli containerd.io

Docker hello-world 이미지를 통해 설치확인

# systemctl start docker

# docker version

[root@localhost ~]# systemctl start docker

[root@localhost ~]# docker version

Client: Docker Engine - Community

Version: 20.10.17

API version: 1.41

Go version: go1.17.11

Git commit: 100c701

Built: Mon Jun 6 23:05:12 2022

OS/Arch: linux/amd64

Context: default

Experimental: true

Server: Docker Engine - Community

Engine:

Version: 20.10.17

API version: 1.41 (minimum version 1.12)

Go version: go1.17.11

Git commit: a89b842

Built: Mon Jun 6 23:03:33 2022

OS/Arch: linux/amd64

Experimental: false

containerd:

Version: 1.6.6

GitCommit: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1

runc:

Version: 1.1.2

GitCommit: v1.1.2-0-ga916309

docker-init:

Version: 0.19.0

GitCommit: de40ad0

# docker run hello-world

[root@localhost ~]# docker run hello-world

Unable to find image 'hello-world:latest' locally

latest: Pulling from library/hello-world

2db29710123e: Pull complete

Digest: sha256:13e367d31ae85359f42d637adf6da428f76d75dc9afeb3c21faea0d976f5c651

Status: Downloaded newer image for hello-world:latest

Hello from Docker!

This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

1. The Docker client contacted the Docker daemon.

2. The Docker daemon pulled the "hello-world" image from the Docker Hub.

(amd64)

3. The Docker daemon created a new container from that image which runs the

executable that produces the output you are currently reading.

4. The Docker daemon streamed that output to the Docker client, which sent it

to your terminal.

To try something more ambitious, you can run an Ubuntu container with:

$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:

https://hub.docker.com/

For more examples and ideas, visit:

https://docs.docker.com/get-started/

[root@localhost ~]#

NVIDIA Container Toolkit 설치

# distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo

# yum install nvidia-container-toolkit

[root@localhost ~]# distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo

[libnvidia-container]

name=libnvidia-container

baseurl=https://nvidia.github.io/libnvidia-container/stable/centos7/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

[libnvidia-container-experimental]

name=libnvidia-container-experimental

baseurl=https://nvidia.github.io/libnvidia-container/experimental/centos7/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=0

gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

[nvidia-container-runtime]

name=nvidia-container-runtime

baseurl=https://nvidia.github.io/nvidia-container-runtime/stable/centos7/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

[nvidia-container-runtime-experimental]

name=nvidia-container-runtime-experimental

baseurl=https://nvidia.github.io/nvidia-container-runtime/experimental/centos7/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=0

gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

[nvidia-docker]

name=nvidia-docker

baseurl=https://nvidia.github.io/nvidia-docker/centos7/$basearch

repo_gpgcheck=1

gpgcheck=0

enabled=1

gpgkey=https://nvidia.github.io/nvidia-docker/gpgkey

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

[root@localhost ~]#

[root@localhost ~]# yum install nvidia-container-toolkit -y

Loaded plugins: fastestmirror, langpacks, nvidia

Loading mirror speeds from cached hostfile

* base: mirror.kakao.com

* epel: mirror.sabay.com.kh

* extras: mirror.kakao.com

* updates: mirror.kakao.com

TensorFlow GPU 설치

DockerHub에 저장되어 있는 TensorFlow GPU 이미지를 로컬에 다운받아 실행합니다.

# docker pull tensorflow/tensorflow:latest-gpu-py3-jupyter

# docker run --gpus all -it --rm -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

[root@localhost ~]# docker pull tensorflow/tensorflow:latest-gpu-py3-jupyter

latest-gpu-py3-jupyter: Pulling from tensorflow/tensorflow

7ddbc47eeb70: Pull complete

c1bbdc448b72: Pull complete

[root@localhost ~]# docker run --gpus all -it --rm -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

성공적으로 컨테이너가 실행되면 아래와 같은 화면이 출력됩니다.

웹브라우저에 127.0.0.1:8888을 입력합니다.

원격에서 접속할 때는 ;방화벽포트를 열어줘야 한다.

[root@localhost ~]# firewall-cmd --permanent --zone=public --add-port=8888/tcp

success

[root@localhost ~]# firewall-cmd --reload

success

[root@localhost ~]#

웹브라우저에 서버ip:8888을 입력합니다.

docker 실행시 웹주소가 보이는데 토크을 복사하여 웹브라우저에서 요청하면 로그인창이 없어진다.

http://서버ip:8888/?token=f183904f1fd741b169c3d484c606a38d5e8da90348fff0ab

주피터 노트북이 실행되면 아래 명령어를 입력하고 Container에서 정상적으로 GPU 장치를 인식하는지 확인합니다.

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

설치된 GPU 장치가 출력이 되면 모든 작업이 정성적으로 완료된 것입니다.

5. docker 컨테이너 데이터 공유

docker를 종료하면 내부에서 작성한 데이터도 삭제된다.

도커볼륨을 이용하여 데이터를 공유한다.

호스트PC에 폴더를 생성한다.

[root@localhost ~]# mkdir mydata

도커 컨테이너를 시작할때 -v /root/mydata:/tf/mydata 옵션을 추가한다.

[root@localhost ~]# docker run --gpus all -it --rm -p 8888:8888 -v /root/mydata:/tf/mydata tensorflow/tensorflow:latest-gpu-py3-jupyter

웹브라우저에서 다시 접속하면 mydata 폴더가 추가되었다.

mydata 폴더에 추가한 파일은 삭제되지 않는다.

6. Jupyter Notebook을 실행 시 비밀번호를 설정

Docker Container 내부에서 Jupyter Notebook을 실행 시 비밀번호를 설정하는 방법은 두 가지가 있다.

1) Dockerfile을 통해서 Docker Image를 만들 때, 미리 설정해주는 방법

2) Container 실행 후 Shell 명령어로 바꾸는 방법

1. Build 단계에서 설정

# Dockerfile 내용 중

RUN jupyter notebook --generate-config

RUN echo '\

from IPython.lib import passwd \n\

password = passwd("입력할 비밀번호") \n\

c.NotebookApp.password = password' >> /root/.jupyter/jupyter_notebook_config.py

2. jupyter notebook 설정에서 변경

$ jupyter notebook password

Enter password: ****

Verify password: ****

[NotebookPasswordApp] Wrote hashed password to /Users/you/.jupyter/jupyter_notebook_config.json

댓글목록

등록된 댓글이 없습니다.

머신러닝

페이지 정보

본문

9. CentOS 7에서 TensorFlow GPU를 설치

댓글목록