LINUX.ORG.RU

Разработка с помощью Tensorflow, VS и Conda

 , ,


1

1

После уставноки Tensorflow, VS и Conda с
Куча ошибок при попытках запуска кода:

Found 575 images belonging to 3 classes.
Found 69 images belonging to 3 classes.
2021-11-29 07:23:46.324131: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-29 07:23:46.991234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1401 MB memory:  -> device: 0, name: GeForce GT 710, 
pci bus id: 0000:08:00.0, compute capability: 3.5
C:\Users\kamil\.conda\envs\tensorflow\lib\site-packages\keras\optimizer_v2\adam.py:105: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  super(Adam, self).__init__(name, **kwargs)
Epoch 1/30
C:\Users\kamil\.conda\envs\tensorflow\lib\site-packages\tensorflow\python\util\dispatch.py:1096: UserWarning: "`categorical_crossentropy` received `from_logits=True`, but the `output` argument was produced by a sigmoid or softmax activation and thus does not represent logits. Was this intended?"
  return dispatch_target(*args, **kwargs)
2021-11-29 07:23:49.612822: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8100
2021-11-29 07:23:50.894726: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:51.946854: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1018.38MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:53.234559: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 729.89MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:53.563607: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 562.77MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:53.979189: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 360.53MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:53.984987: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 532.02MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:54.724332: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 790.02MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:54.878406: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 289.45MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:55.423657: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 866.56MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-11-29 07:23:59.404689: E tensorflow/stream_executor/cuda/cuda_driver.cc:1018] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.408237: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] INTERNAL: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.411331: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] INTERNAL: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.414816: I tensorflow/stream_executor/cuda/cuda_driver.cc:739] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.418365: E tensorflow/stream_executor/stream.cc:4476] INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.422467: I tensorflow/stream_executor/cuda/cuda_driver.cc:739] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.425739: E tensorflow/stream_executor/stream.cc:4476] INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated
2021-11-29 07:23:59.429235: E tensorflow/stream_executor/cuda/cuda_driver.cc:1163] failed to enqueue async memcpy from device to host: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated; 
host dst: 0xd4cd14bb90; GPU src: (nil); size: 8=0x8
2021-11-29 07:23:59.434156: I tensorflow/stream_executor/stream.cc:4442] [stream=000002486DC6B350,impl=000002486F3F0610] INTERNAL: stream did not block host until done; was already in an error state     
2021-11-29 07:23:59.437962: W tensorflow/core/kernels/gpu_utils.cc:69] Failed to check cudnn convolutions for out-of-bounds reads and writes with an error message: 'stream did not block host until done; 
was already in an error state'; skipping this check. This only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
2021-11-29 07:23:59.444743: F tensorflow/stream_executor/cuda/cuda_dnn.cc:213] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.

Первый раз имею дело с tensorflow, очень нужна помощь с обьяснениями как их запускать… Заранее спасибо за любую помощь!



Последнее исправление: katemisik (всего исправлений: 2)

Язабан.

anonymous
()
Ответ на: комментарий от ymn

Начала запускать на своей локальной машине с официального примера:

import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf

_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

BATCH_SIZE = 32
IMG_SIZE = (160, 160)

train_dataset = tf.keras.utils.image_dataset_from_directory(train_dir,
                                                            shuffle=True,
                                                            batch_size=BATCH_SIZE,
                                                            image_size=IMG_SIZE)


validation_dataset = tf.keras.utils.image_dataset_from_directory(validation_dir,
                                                                shuffle=True,
                                                                batch_size=BATCH_SIZE,
                                                                image_size=IMG_SIZE)

class_names = train_dataset.class_names

plt.figure(figsize=(10, 10))
for images, labels in train_dataset.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

по частям, сначала ощибки не было, на этом этапе появились такие же ошибки

Found 2000 files belonging to 2 classes.
2021-11-29 08:31:35.379497: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-29 08:31:35.731534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1401 MB memory:  -> device: 0, name: GeForce GT 710, pci bus id: 0000:08:00.0, compute capability: 3.5
Found 1000 files belonging to 2 classes.
katemisik
() автор топика

Мя сварщик ненастоящий, но могу сказать, что у сабжа есть большие проблемы с совместимостью между версиями. Прикладнуха поибита гвоздями к определённой версии Keras, Keras требует «правильной» версии TensorFlow в очень узком окне, TensorFlow - правильной версии CUDA toolkit (не всегда той, что совместима с твоим гпу, лол) и сопутствующих библиотек.

Крч, если чисто для поиграться, то пиши на питоне. Он дебильноватый, но существенно меньше времени сжирает на конфигурирование.

izzholtik ★★★
()
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.