LINUX.ORG.RU

OpenCL на amd не работает

 ,


0

1

что в арче, что в Debian, не хотит работать, например oclvanitygen. На NVidia работает

Available OpenCL platforms:
0: [Mesa] Clover
  0: [AMD] AMD RAVEN (DRM 3.35.0, 5.4.0-4-amd64, LLVM 9.0.1)
1: [NVIDIA Corporation] NVIDIA CUDA
  0: [NVIDIA Corporation] GeForce GTX 1050
ovg -p 1 1LoR
Difficulty: 77178
Compiling kernel, can take minutes...done!
Pattern: 1LoR                                                                  
Address: 1LoR7nNLoHqpEFtFLHA2RpnhK18MYxnAKN
ovg -p 0 1LoR
Difficulty: 77178
Compiling kernel, can take minutes...failure.
clBuildProgram: CL_BUILD_PROGRAM_FAILURE
Build log:
input.cl:173:19: error: variable in constant address space must be initialized
Device: AMD RAVEN (DRM 3.35.0, 5.4.0-4-amd64, LLVM 9.0.1)
Vendor: AMD (1002)
Driver: 19.3.3
Profile: FULL_PROFILE
Version: OpenCL 1.1 Mesa 19.3.3
Max compute units: 5102667438252621832
Max workgroup size: 256
Global memory: 3221225472
Max allocation: 2254857830

и в Debian, и в Арч, одна и та же ошибка

clinfo

Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 19.3.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 10.2.131
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Platform Extensions function suffix             NV

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD RAVEN (DRM 3.35.0, 5.4.0-4-amd64, LLVM 9.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 19.3.3
  Driver Version                                  19.3.3
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               8
  Max clock frequency                             1100MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              3221225472 (3GiB)
  Error Correction support                        No
  Max memory allocation                           2254857830 (2.1GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        2147483647 (2GiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 1050
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  440.59
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               5
  Max clock frequency                             1493MHz
  Compute Capability (NV)                         6.1
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2099904512 (1.956GiB)
  Error Correction support                        No
  Max memory allocation                           524976128 (500.7MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        245760 (240KiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 No
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics


NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD RAVEN (DRM 3.35.0, 5.4.0-4-amd64, LLVM 9.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD RAVEN (DRM 3.35.0, 5.4.0-4-amd64, LLVM 9.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD RAVEN (DRM 3.35.0, 5.4.0-4-amd64, LLVM 9.0.1)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
★★★★★

Ответ на: комментарий от buratino

Наверное имеется в виду amdgpu-pro. Правда он официально поддерживается только для убунты и рхела.

Для арча есть в ауре: https://aur.archlinux.org/packages/opencl-amd/

Для дебиана, вероятно, ручной пердолинг.

Midael ★★★★★ ()

У тебя на AMD OpenCL 1.1, а на Nvidia OpenCL 1.2 )))

То что ты собираешь скорее всего требует OpenCL 2.0 и выше. Документацию oclvanitygen читал или за тебя посмотреть? )

Rx0 ()
Ответ на: комментарий от Rx0

легче не стало, только ошибок больше стало:

./oclvanitygen -p 1 1ooo
Difficulty: 4553521
Compiling kernel, can take minutes...failure.
clBuildProgram: CL_BUILD_PROGRAM_FAILURE
Build log:
/tmp/OCL2919T1.cl:173:19: error: variable in constant address space must be initialized
__constant bignum bn_zero;
                  ^
/tmp/OCL2919T1.cl:173:19: error: constant address space qualified variables are required to be initialized
2 errors generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
Device: gfx902
Vendor: Advanced Micro Devices, Inc. (1002)
Driver: 3004.6 (PAL,HSAIL)
Profile: FULL_PROFILE
Version: OpenCL 2.0 AMD-APP (3004.6)
Max compute units: 8
Max workgroup size: 256
Global memory: 2684354560
Max allocation: 912680550
Available OpenCL platforms:
0: [NVIDIA Corporation] NVIDIA CUDA
  0: [NVIDIA Corporation] GeForce GTX 1050
1: [Advanced Micro Devices, Inc.] AMD Accelerated Parallel Processing
  0: [Advanced Micro Devices, Inc.] gfx902

buratino ★★★★★ ()
Ответ на: комментарий от Rx0
 yay -S piglit-git
:: There are 2 providers available for waffle:
:: Repository AUR
    1) waffle 2) waffle-git 

Enter a number (default=1): 1
==> Error: Could not find all required packages:
    glproto (Wanted by: piglit-git)
[user@myarch ~]$ yay -S piglit-git
:: There are 2 providers available for waffle:
:: Repository AUR
    1) waffle 2) waffle-git 

Enter a number (default=1): 2
==> Error: Could not find all required packages:
    glproto (Wanted by: piglit-git)

в дебиане пока OpenCL 1.1, пойду там проверю. ох, чё-то старовата версия там, 05-2018

buratino ★★★★★ ()
Ответ на: комментарий от buratino

В Ubuntu:

# apt search piglit
Сортировка… Готово
Полнотекстовый поиск… Готово
piglit/eoan,now 0~git20180515-62ef6b0db-1 amd64 [установлен]
  Open-source test suite for OpenGL and OpenCL implementations

Или собери сам, там есть readme по сборке.

$ git clone https://github.com/Igalia/piglit.git
Rx0 ()
Последнее исправление: Rx0 (всего исправлений: 1)
Ответ на: комментарий от buratino

Я тебе ссылку дал, там инструкция по использованию с примерами.

Все просто, запуск теста и просмотр результата:

Test OpenCL

$ piglit run quick_cl results/test-cl

View result

$ piglit summary html summary/test-cl results/test-cl

В хомяке будет results/test-cl и там index.html

Смотришь в браузере.

Rx0 ()

нашёл какой-то майнер... в amd видеопамяти не хватает, требуется 2 гб, а выделено только один. перевыделить как-нибудь можно?

что интересно, этот майнер на нвидии в cuda даёт 13-14 мегахэшей, а в opencl - полтора. это нормально?

buratino ★★★★★ ()
Ответ на: комментарий от buratino

мне хочется хоть чёнить на веге запустить :) после чего я забуду и про OpenCL и про всё остальное, спя спокойно, и зная, что работает :)

Пока это не заработает об остальном забудь. )

$ piglit run quick_cl results/test-cl
[105/704] skip: 10, pass: 88, fail: 0 /-\|/-\|
Rx0 ()
Ответ на: комментарий от Ford_Focus

Да, твоя ссылка по теме полезна - GalliumCompute

Вот еще из этой вики инфа для определения - RadeonFeature

На этих девайсах работает:

Southern Islands

CAPE VERDE, PITCAIRN, TAHITI, OLAND, HAINAN	HD7750 - HD7970, R9 270, R9-280, R7 240, R7 250

Sea Islands

BONAIRE, KABINI, MULLINS, KAVERI, HAWAII	HD7790, R7 260, R9 290

У меня:

$ clinfo -l
Platform #0: Clover
 `-- Device #0: AMD Radeon HD 8950 (BONAIRE, DRM 3.36.0, 5.6.0-rc2+, LLVM 9.0.0)

$ clinfo | egrep -i opencl
  Platform Version                                OpenCL 1.1 Mesa 19.2.8
  Device Version                                  OpenCL 1.1 Mesa 19.2.8
  Device OpenCL C Version                         OpenCL C 1.1 
    Run OpenCL kernels                            Yes
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Profile                              OpenCL 2.1
  • в стабле и mainline ядрах для девайсов BONAIRE не работает OpenCL, я переписывал kfd в amdgpu для подддержки. Пока не решил отсылать в mainline, девайс устаревает, сделал ради академического интереса.
Rx0 ()