바뀜

12,449 바이트 추가됨 , 2022년 4월 7일 (목) 14:31

→‎에러

9번째 줄: 9번째 줄:

==== 의존성 확인 ====

사용하는 버전이 안맞으면 계속 에러만 뜬다.

−

~~{| class="wikitable"~~

−

~~!tensorflow 버전~~

−

~~!알맞는 CUDA~~

−

|-

−

~~|2.4.x~~

−

~~|11.1~~

−

|-

−

~~|2.3.x~~

−

~~|10.1~~

−

|}

+

파이썬, tensorflow, (GPU를 사용하는 경우)CUDA, (GPU를 사용하는 경우)CuDNN이 다 맞아야 하는데, 그 의존성은 다음 링크에서 확인하자.

+

https://www.tensorflow.org/install/source_windows#tested_build_configurations

==== OS에 따른 설치 ====

{| class="wikitable"

33번째 줄: 26번째 줄:

|설치

|

−

# tensorflow가 요구하는 파이썬 버전 확인하고 설치해준다.~~(https://www.tensorflow.org/install?hl=ko)~~

+

# tensorflow가 요구하는 파이썬 버전 확인하고 설치해준다.

# pip install --upgrade pip

−

#CPU만 쓴다면 <code>pip install tensorflow</code>, GPU를 사용한다면 <code>pip install tensorflow-gpu</code>

+

#CPU만 쓴다면 <code>pip install tensorflow</code>, GPU를 사용한다면 <code>pip install tensorflow-gpu</code>(체감으론 GPU가 2배 정도 빠른듯)

#위 작업에서 <code>ERROR: Could not find a version that satisfies the requirement tensorflow</code>가 뜬다면 파이썬이 64bit용이 아니기 때문이다.(텐서플로우는 64bit 기반)

# cpu stable 버전이 요구하는 넘파이 버전이 일정 미만일 수 있다.

텐서플로우 버전 확인

−

파이썬 콘솔에서 다음과 같이 진행하거나, 적당히 파일 실행해보자.

+

파이썬 콘솔에서 다음과 같이 진행하거나, 적당히 파일 실행해보자.<syntaxhighlight lang="python">

−

$ python

−

> import tensorflow as tf

−

> tf.__version__

+

</syntaxhighlight>import할 때 GPU 사용 가능한 버전에선 무언가가 없다며 오류가 뜰 것이다.

|-

|GPU사용준비

|이 과정은 GPU를 사용할 때 진행한다.(CPU만 사용한다면 바로 설치로 넘어가면 됨.)

−

* ~~필요한 텐서플로우를 설치한 후 다음 링크에서 호환되는 버전을 찾는다. https://www.~~tensorflow~~.org/install~~/~~source_windows#tested_build_configurations~~

+

*<code>pip install tensorflow-gpu</code>

−

* 텐서플로우 버전에 맞게 [https://developer.nvidia.com/cuda-toolkit-archive CUDA], [https://developer.nvidia.com/rdp/cudnn-archive cuDNN]를 설치해준다. CUDA는 그냥 설치하면 되고, cuDNN은 맞는 것을 다운받아 압축을 푼 후 내부의 파일들을 CUDA의 ~~해당하는~~ 폴더에 내용물들을 복사해주면 된다. 기본경로는 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v버전 형태이다.

+

* 텐서플로우 버전에 맞게 [https://developer.nvidia.com/cuda-toolkit-archive CUDA], [https://developer.nvidia.com/rdp/cudnn-archive cuDNN]를 설치해준다. CUDA는 그냥 설치하면 되고, cuDNN은 맞는 것을 다운받아 압축을 푼 후 내부의 파일들을 CUDA의 이름이 겹치는 폴더에 내용물들을 복사해주면 된다.(그냥 CUDA 안에 전체 폴더를 끌어다 놓으면 파일들을 알아서 옮긴다.) 기본경로는 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v버전 형태이다.(CUDA는 GPU에서 병렬 처리하는 알고리즘을 각종 언어에서 사용할 수 있게 해주는 기술, CuDNN은 가속화 라이브러리 등 일반적인 루틴을 빠르게 만들어주는 라이브러리.)

* 시스템 환경변수 편집 -> 고급 -> 환경변수(Edit environment for your account)에서 system variables 중 CUDA_PATH가 있는지 확인.

+

파이썬 콘솔에서 다음과 같이 다시 점검해보자. 위에서 떴던 오류가 이젠 없을 것이다.<syntaxhighlight lang="python">

+

$ python

+

> import tensorflow as tf

+

> tf.__version__

+

</syntaxhighlight>

|}

|-

68번째 줄: 64번째 줄:

|-

|설치확인

−

|~~텐서플로우 버전에 따라 명령어가 다르다~~. ~~버전 확인 후 알맞은 방법으로 hello를 표시해보자~~.

+

|

+

|버전확인<syntaxhighlight lang="python">

+

import tensorflow as tf

+

print(tf.__version__)

+

</syntaxhighlight>

+

|-

+

|CUDA 버전확인

+

|혹시 잘못설치하진 않았나..

+

|터미널에서 <code>nvcc -V</code>

+

|-

+

|

+

|

{| class="wikitable"

−

~~!과정~~

!설명

+

!방법

|-

−

|~~1.x버전~~

+

|윈도우

−

|~~<syntaxhighlight lang="python"~~>

+

|C: > Program Files > NVIDIA GPU Computing Toolkit > CUDA > v10.0 (자신의 쿠다 버전) > include > cudnn.h

−

~~import tensorflow as tf~~

+

위 파일(혹은 cudnn_version.h)을 메모장으로 열고 MAJOR로 검색해보자.

−

~~sess = tf~~.~~Session~~()

−

~~hello = tf~~.~~constant('hello')~~

−

~~print~~(~~sess~~.~~run(hello~~))

−

~~</syntaxhighlight>~~

|-

−

|~~2.x버전~~

+

|우분투

−

|~~Session을 정의하고 run의 과정이 빠졌다.~~<~~syntaxhighlight lang="python"~~>

+

|<nowiki>$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2</nowiki>

−

~~import tensorflow as tf~~

−

~~node1 = tf~~.~~constant(3.0, dtype=tf.float32)~~

−

~~node2 = tf.constant(4.0)~~

−

~~tf.print(node1,node2)~~

−

</~~syntaxhighlight~~>~~학습 등의 이유로 굳이 Session의 정의가 필요할 땐 다음과 같은 코드를 상단에 넣는다.~~

−

~~그러면 1~~.~~x버전과 같은 용법을 사용할 수 있다.<syntaxhighlight lang="python">~~

+

정상이라면 Major버전, Minor버전 따위를 알려준다.

−

~~# import tensorflow as tf~~

−

~~import tensorflow.compat.v1 as tf~~

−

~~tf.disable_v2_behavior()~~

−

~~</syntaxhighlight>~~

|}

−

|~~버전확인<syntaxhighlight lang="python">~~

+

|

−

~~import tensorflow as tf~~

−

~~print(tf.__version__)~~

−

~~</syntaxhighlight>~~

|}

188번째 줄: 176번째 줄:

(설치했다 지우긴 했다;;)

|터미널에서 <code>pip install --ignore-installed --upgrade tensorflow</code>를 시도해보자.

+

|}

+

{| class="wikitable mw-collapsible mw-collapsed"

+

!에러

+

!원인

+

!해결방법

|-

−

|~~ImportError~~: ~~DLL load failed~~: ~~지정된 모듈을 찾을 수 없습니다~~.

+

|뭔진 몰라도... fit에서부터 실행되는 파일이 주르륵 나열된다. 메모리가 모자랄 때 비슷한 에러가 뜨는 듯한데, 이와는 달리 의존성 문제인 듯하다.

−

|~~암드의 CPU를 사용하는~~ 경우.

+

--> 203 history = model.fit(train_data, epochs=2, callbacks=[early_stop]) # , validation_data=val_data

+

204

+

205 #--------------------모델 테스트에 관하여.

+

c:\venvs\ai\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)

+

1182 _r=1):

+

1183 callbacks.on_train_batch_begin(step)

+

-> 1184 tmp_logs = self.train_function(iterator)

+

1185 if data_handler.should_sync:

+

1186 context.async_wait()

+

c:\venvs\ai\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)

+

883

+

884 with OptionalXlaContext(self._jit_compile):

+

--> 885 result = self._call(*args, **kwds)

+

886

+

887 new_tracing_count = self.experimental_get_tracing_count()

+

c:\venvs\ai\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)

+

915 # In this case we have created variables on the first call, so we run the

+

916 # defunned version which is guaranteed to never create variables.

+

--> 917 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable

+

918 elif self._stateful_fn is not None:

+

919 # Release the lock early so that multiple threads can perform the call

+

c:\venvs\ai\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)

+

3038 filtered_flat_args) = self._maybe_define_function(args, kwargs)

+

3039 return graph_function._call_flat(

+

-> 3040 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access

+

3041

+

3042 @property

+

c:\venvs\ai\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)

+

1962 # No tape is watching; skip to running the function.

+

1963 return self._build_call_outputs(self._inference_function.call(

+

-> 1964 ctx, args, cancellation_manager=cancellation_manager))

+

1965 forward_backward = self._select_forward_and_backward_functions(

+

1966 args,

+

c:\venvs\ai\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)

+

594 inputs=args,

+

595 attrs=attrs,

+

--> 596 ctx=ctx)

+

597 else:

+

598 outputs = execute.execute_with_cancellation(

+

c:\venvs\ai\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)

+

58 ctx.ensure_initialized()

+

59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

+

---> 60 inputs, attrs, num_outputs)

+

61 except core._NotOkStatusException as e:

+

62 if name is not None:

+

UnknownError: 2 root error(s) found.

+

(0) Unknown: KeyError: 13695

+

Traceback (most recent call last):

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 2898, in get_loc

+

return self._engine.get_loc(casted_key)

+

File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

+

File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

+

File "pandas\_libs\hashtable_class_helper.pxi", line 1032, in pandas._libs.hashtable.Int64HashTable.get_item

+

File "pandas\_libs\hashtable_class_helper.pxi", line 1039, in pandas._libs.hashtable.Int64HashTable.get_item

+

KeyError: 13695

+

The above exception was the direct cause of the following exception:

+

Traceback (most recent call last):

+

File "c:\venvs\ai\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__

+

ret = func(*args)

+

File "c:\venvs\ai\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper

+

return func(*args, **kwargs)

+

File "c:\venvs\ai\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func

+

values = next(generator_state.get_iterator(iterator_id))

+

File "c:\venvs\ai\lib\site-packages\keras\engine\data_adapter.py", line 822, in wrapped_generator

+

for data in generator_fn():

+

File "c:\venvs\ai\lib\site-packages\keras\engine\data_adapter.py", line 948, in generator_fn

+

yield x[i]

+

File "<ipython-input-3-c2cb38462dc6>", line 181, in __getitem__

+

train_x, train_y = next(self.generator)

+

File "<ipython-input-3-c2cb38462dc6>", line 158, in __datagenerator__

+

target_df = target_df_batch.loc[j:j + 1500 + predict_num]

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__

+

return self._getitem_axis(maybe_callable, axis=axis)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexing.py", line 1088, in _getitem_axis

+

return self._get_slice_axis(key, axis=axis)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexing.py", line 1123, in _get_slice_axis

+

slice_obj.start, slice_obj.stop, slice_obj.step, kind="loc"

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 4969, in slice_indexer

+

start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 5178, in slice_locs

+

end_slice = self.get_slice_bound(end, "right", kind)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 5092, in get_slice_bound

+

raise err

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 5086, in get_slice_bound

+

slc = self.get_loc(label)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 2900, in get_loc

+

raise KeyError(key) from err

+

KeyError: 13695

+

[[{{node PyFunc}}]] [[IteratorGetNext]]

+

(1) Unknown: KeyError: 13695

+

Traceback (most recent call last):

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 2898, in get_loc

+

return self._engine.get_loc(casted_key)

+

File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

+

File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

+

File "pandas\_libs\hashtable_class_helper.pxi", line 1032, in pandas._libs.hashtable.Int64HashTable.get_item

+

File "pandas\_libs\hashtable_class_helper.pxi", line 1039, in pandas._libs.hashtable.Int64HashTable.get_item

+

KeyError: 13695

+

The above exception was the direct cause of the following exception:

+

Traceback (most recent call last):

+

File "c:\venvs\ai\lib\site-packages\tensorflow\python\ops\script_ops.py", line 249, in __call__

+

ret = func(*args)

+

File "c:\venvs\ai\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 645, in wrapper

+

return func(*args, **kwargs)

+

File "c:\venvs\ai\lib\site-packages\tensorflow\python\data\ops\dataset_ops.py", line 892, in generator_py_func

+

values = next(generator_state.get_iterator(iterator_id))

+

File "c:\venvs\ai\lib\site-packages\keras\engine\data_adapter.py", line 822, in wrapped_generator

+

for data in generator_fn():

+

File "c:\venvs\ai\lib\site-packages\keras\engine\data_adapter.py", line 948, in generator_fn

+

yield x[i]

+

File "<ipython-input-3-c2cb38462dc6>", line 181, in __getitem__

+

train_x, train_y = next(self.generator)

+

File "<ipython-input-3-c2cb38462dc6>", line 158, in __datagenerator__

+

target_df = target_df_batch.loc[j:j + 1500 + predict_num]

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__

+

return self._getitem_axis(maybe_callable, axis=axis)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexing.py", line 1088, in _getitem_axis

+

return self._get_slice_axis(key, axis=axis)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexing.py", line 1123, in _get_slice_axis

+

slice_obj.start, slice_obj.stop, slice_obj.step, kind="loc"

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 4969, in slice_indexer

+

start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 5178, in slice_locs

+

end_slice = self.get_slice_bound(end, "right", kind)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 5092, in get_slice_bound

+

raise err

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 5086, in get_slice_bound

+

slc = self.get_loc(label)

+

File "c:\venvs\ai\lib\site-packages\pandas\core\indexes\base.py", line 2900, in get_loc

+

raise KeyError(key) from err

+

KeyError: 13695

+

[[{{node PyFunc}}]] [[IteratorGetNext]] [[Shape/_8]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_29572]

+

Function call stack: train_function -> train_function

+

|텐서플로우, CUDA, CuDNN 등에서 뭔가 안맞았을 때...? 모르겠다 젠장;

+

GPU자원의 문제라는 설도 있었다.

+

다 맞추고 텐서플로우를 GPU버전으로 다시 설치해도 뜨네;

+

|GPU자원의 문제라면 탄력적으로 GPU를 사용하게 하는 방식으로 해결이 가능하리라...?

+

Tensorflow2.X의 경우.

+

import tensorflow as tf

+

config = tf.compat.v1.ConfigProto()

+

config.gpu_options.allow_growth = True

+

session = tf.compat.v1.Session(config=config)

+

|}

+

=== 자원 문제 ===

+

{| class="wikitable"

+

!에러

+

!원인

+

!해결방법

+

|-

+

|Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized.

+

|GPU가 여러개인 경우 메모리를 나누어 할당하는데, 이 탓에 연산에서 메모리가 모자라 에러가 발생하곤 한다.

+

[해결 안됨....... 정보를 모으는 중...]

+

|<code>import os</code>

+

<code>os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"</code>

+

<code>os.environ["CUDA_VISIBLE_DEVICES"]="0"</code>

+

|}

+

== 설치 전 인프라 관련 ==

+

{| class="wikitable"

+

!에러

+

!원인

+

!해결방법

+

|-

+

|importerror: could not find the dll(s) 'msvcp140_1.dll'. tensorflow requires that these dlls be installed in a directory that is named in your %path% environment variable. you may install these dlls by downloading "microsoft c++ redistributable for visual studio 2015, 2017 and 2019" for your platform from this url: <nowiki>https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads</nowiki>

+

|마으크로소프트의 c++ redistributable for visual studio가 설치되 않았기 때문에 뜨는 에러.

+

위 설치 방법을 잘 따라왔다면 발생하지 않을 에러이다.

+

|microsoft c++ redistributable for visual studio 2015, 2017 and 2019를 설치해주면 되는데... 어째서인지 설치페이지가 안열려 곤란하다. 아무리 검색해도 이에 관한 내용은 없고...

+

이런 경우 학교망, 회사망을 사용하고 있지 않은지 점검해보자. 학교망, 회사망에선 이의 설치를 막는 경우도 있다.

+

=> 집에서 설치하고 복귀

+

|}

+

== 사용 관련 ==

+

{| class="wikitable"

+

!에러

+

!원인

+

!해결방법

+

|-

+

|I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)

+

|메모리 사용량이 너무 큰 게 원인인듯.

|

+

|-

+

|tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory

+

|메모리 사용량이 너무 크다.

+

|

+

|-

+

|WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 30000 batches). You may need to use the repeat() function when building your dataset.

+

|케라스에서 제너레이터로 학습을 진행할 때 발생했다.

+

'''steps_per_epoch''' 값이 너무 크기 때문으로 생각된다.

+

|[https://foxtrotin.tistory.com/535 링크]

|}

[[분류:딥러닝 라이브러리]]

익명 사용자

180.81.16.24

바뀜

TensorFlow (편집)

2022년 4월 7일 (목) 14:31 판

둘러보기 메뉴

검색