Getting Started with OpenCL
Probably the most amazing thing about OpenCL is its heterogeneous nature. An OpenCL kernel can run on just about any compute device in your computer, the CPU, the GPU or even a FPGA and it can all be orchestrated from the host with ease.
As you may be aware, 3rd generation Intel Core (and later) processors include an integrated graphics component and in the HD400 and later chips this compute power is not to be sniffed at and certainly worth exploiting however its not entirely clear how you access it. If like me you have a discrete graphics card you may be wondering as I did why the Intel GPU is not accessible.
Here’s what to do.
Boot your computer into the BIOS settings and look for a section probably entitled something like “System Agent”, under this menu :
- “Initiate Graphic Adapter” – set this to PCIe/PCI
- “iGPU Multi-Monitor” – set this to Enabled
Save your settings and re-boot.
Now visit the Intel website and download the appropriate graphics driver for your CPU, install it and re-boot once more, then when you open your device panel you can see the integrated Intel graphics device like this :
We’re ready to start programming.
Next you are going to need an OpenCL SDK so that you have the headers you need to build an OpenCL program (the drivers already have a run-time). It doesn’t really matter who’s you use, in my case I downloaded the Nvidia tools which are part of the CUDA SDK. Currently the download is here but may move at a later date.
Once installed you will need to set-up your project to access the SDK. In Visual Studio 2013 (12 is the same) select the property manager tab and select your build target, in my case I select “Debug | x64” then double-click “Microsoft.Cpp.x64.user” so that you only modify properties for this project. Now you have the property dialog open select “VC++ Directories” and enter :
- Include Directories – $(CUDA_PATH)\include;$(IncludePath)
- Library Directories – $(CUDA_PATH)\lib\x64;$(LibraryPath)
The CUDA installer has conveniently created an environment variable called CUDA_PATH to make this nice and clean.
Now go to the “Linker” then “General” section and update :
- Additional Library Directories – $(CUDA_LIB_PATH);%(AdditionalLibraryDirectories)
Then “Linker”, “Input” and update :
- Additional Dependencies – OpenCL.lib;%(AdditionalDependencies)
Hit OK and we’re ready to go.
This is a little program to look for compute devices on your system and print out their capabilities :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
#include "stdafx.h" #include <CL/cl.h> #include <memory> #include <vector> #include <iostream> void displayPlatformInfo(cl_platform_id id, cl_platform_info param_name, const char* paramNameAsStr) { cl_int error = 0; size_t paramSize = 0; error = clGetPlatformInfo(id, param_name, 0, NULL, ¶mSize); std::unique_ptr<char> moreInfo(new char[paramSize]); error = clGetPlatformInfo(id, param_name, paramSize, moreInfo.get(), NULL); if (error == CL_SUCCESS) { std::cout << paramNameAsStr << " : " << moreInfo.get() << std::endl; } } void displayDeviceDetails(cl_device_id id, cl_device_info param_name, const char* paramNameAsStr) { cl_int error = 0; size_t paramSize = 0; error = clGetDeviceInfo(id, param_name, 0, NULL, ¶mSize); if (error != CL_SUCCESS) { perror("Unable to obtain device info for param\n"); return; } /* the cl_device_info are preprocessor directives defined in cl.h */ switch (param_name) { case CL_DEVICE_TYPE: { std::unique_ptr<cl_device_type> devType(new cl_device_type[paramSize]); error = clGetDeviceInfo(id, param_name, paramSize, devType.get(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device info for param\n"); return; } switch (*devType) { case CL_DEVICE_TYPE_CPU: printf("CPU detected\n"); break; case CL_DEVICE_TYPE_GPU: printf("GPU detected\n"); break; case CL_DEVICE_TYPE_ACCELERATOR: printf("Accelerator detected\n"); break; case CL_DEVICE_TYPE_DEFAULT: printf("default detected\n"); break; } } break; case CL_DEVICE_VENDOR_ID: case CL_DEVICE_MAX_COMPUTE_UNITS: case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: { std::unique_ptr<cl_uint> ret(new cl_uint[paramSize]); error = clGetDeviceInfo(id, param_name, paramSize, ret.get(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device info for param\n"); return; } switch (param_name) { case CL_DEVICE_VENDOR_ID: printf("\tVENDOR ID: 0x%x\n", *ret); break; case CL_DEVICE_MAX_COMPUTE_UNITS: printf("\tMaximum number of parallel compute units: %d\n", *ret); break; case CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: printf("\tMaximum dimensions for global/local work-item IDs: %d\n", *ret); break; } } break; case CL_DEVICE_MAX_WORK_ITEM_SIZES: { cl_uint maxWIDimensions; std::unique_ptr<size_t> ret(new size_t[paramSize]); error = clGetDeviceInfo(id, param_name, paramSize, ret.get(), NULL); error = clGetDeviceInfo(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, sizeof(cl_uint), &maxWIDimensions, NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device info for param\n"); return; } printf("\tMaximum number of work-items in each dimension: ( "); for (cl_uint i = 0; i < maxWIDimensions; ++i) { printf("%d ", ret.get()[i]); } printf(" )\n"); } break; case CL_DEVICE_MAX_WORK_GROUP_SIZE: { std::unique_ptr<size_t> ret(new size_t[paramSize]); error = clGetDeviceInfo(id, param_name, paramSize, ret.get(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device info for param\n"); return; } printf("\tMaximum number of work-items in a work-group: %d\n", *ret); } break; case CL_DEVICE_NAME: case CL_DEVICE_VENDOR: { std::unique_ptr<char> data(new char[48]); error = clGetDeviceInfo(id, param_name, paramSize, data.get(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device name/vendor info for param\n"); return; } switch (param_name) { case CL_DEVICE_NAME: printf("\tDevice name is %s\n", data.get()); break; case CL_DEVICE_VENDOR: printf("\tDevice vendor is %s\n", data.get()); break; } } break; case CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: { std::unique_ptr<cl_uint> size(new cl_uint[paramSize]); error = clGetDeviceInfo(id, param_name, paramSize, size.get(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device name/vendor info for param\n"); return; } printf("\tDevice global cacheline size: %d bytes\n", (*size)); break; } break; case CL_DEVICE_GLOBAL_MEM_SIZE: case CL_DEVICE_MAX_MEM_ALLOC_SIZE: { std::unique_ptr<cl_ulong> size(new cl_ulong[paramSize]); error = clGetDeviceInfo(id, param_name, paramSize, size.get(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain device name/vendor info for param\n"); return; } switch (param_name) { case CL_DEVICE_GLOBAL_MEM_SIZE: printf("\tDevice global mem: %ld mega-bytes\n", (*size) >> 20); break; case CL_DEVICE_MAX_MEM_ALLOC_SIZE: printf("\tDevice max memory allocation: %ld mega-bytes\n", (*size) >> 20); break; } } break; } //end of switch } void displayDeviceInfo(cl_platform_id id, cl_device_type dev_type) { /* OpenCL 1.1 device types */ cl_int error = 0; cl_uint numOfDevices = 0; /* Determine how many devices are connected to your platform */ error = clGetDeviceIDs(id, dev_type, 0, NULL, &numOfDevices); if (error != CL_SUCCESS) { perror("Unable to obtain any OpenCL compliant device info"); exit(1); } std::vector<cl_device_id> devices(numOfDevices, nullptr); /* Load the information about your devices into the variable 'devices' */ error = clGetDeviceIDs(id, dev_type, numOfDevices, devices.data(), NULL); if (error != CL_SUCCESS) { perror("Unable to obtain any OpenCL compliant device info"); exit(1); } printf("Number of detected OpenCL devices: %d\n", numOfDevices); /* We attempt to retrieve some information about the devices. */ for (auto device : devices) { displayDeviceDetails(device, CL_DEVICE_TYPE, "CL_DEVICE_TYPE"); displayDeviceDetails(device, CL_DEVICE_NAME, "CL_DEVICE_NAME"); displayDeviceDetails(device, CL_DEVICE_VENDOR, "CL_DEVICE_VENDOR"); displayDeviceDetails(device, CL_DEVICE_VENDOR_ID, "CL_DEVICE_VENDOR_ID"); displayDeviceDetails(device, CL_DEVICE_MAX_MEM_ALLOC_SIZE, "CL_DEVICE_MAX_MEM_ALLOC_SIZE"); displayDeviceDetails(device, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, "CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE"); displayDeviceDetails(device, CL_DEVICE_GLOBAL_MEM_SIZE, "CL_DEVICE_GLOBAL_MEM_SIZE"); displayDeviceDetails(device, CL_DEVICE_MAX_COMPUTE_UNITS, "CL_DEVICE_MAX_COMPUTE_UNITS"); displayDeviceDetails(device, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, "CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS"); displayDeviceDetails(device, CL_DEVICE_MAX_WORK_ITEM_SIZES, "CL_DEVICE_MAX_WORK_ITEM_SIZES"); displayDeviceDetails(device, CL_DEVICE_MAX_WORK_GROUP_SIZE, "CL_DEVICE_MAX_WORK_GROUP_SIZE"); } } int _tmain(int argc, _TCHAR* argv[]) { /* OpenCL 1.1 scalar data types */ cl_uint numOfPlatforms; cl_int error; /* Get the number of platforms */ error = clGetPlatformIDs(0, NULL, &numOfPlatforms); if (error != CL_SUCCESS) { perror("Unable to find any OpenCL platforms"); return(1); } // Allocate memory for the number of installed platforms. std::vector<cl_platform_id> platforms(numOfPlatforms, nullptr); printf("Number of OpenCL platforms found: %d\n", numOfPlatforms); error = clGetPlatformIDs(numOfPlatforms, platforms.data(), NULL); if (error != CL_SUCCESS) { perror("Unable to find any OpenCL platforms"); return(1); } for (auto platform : platforms) { displayPlatformInfo(platform, CL_PLATFORM_PROFILE, "CL_PLATFORM_PROFILE"); displayPlatformInfo(platform, CL_PLATFORM_VERSION, "CL_PLATFORM_VERSION"); displayPlatformInfo(platform, CL_PLATFORM_NAME, "CL_PLATFORM_NAME"); displayPlatformInfo(platform, CL_PLATFORM_VENDOR, "CL_PLATFORM_VENDOR"); displayPlatformInfo(platform, CL_PLATFORM_EXTENSIONS, "CL_PLATFORM_EXTENSIONS"); // Assume that we don't know how many devices are OpenCL compliant, we locate everything ! displayDeviceInfo(platform, CL_DEVICE_TYPE_ALL); } return 0; } |
This gives us output like this :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
Number of OpenCL platforms found: 2 CL_PLATFORM_PROFILE : FULL_PROFILE CL_PLATFORM_VERSION : OpenCL 1.1 CUDA 6.0.1 CL_PLATFORM_NAME : NVIDIA CUDA CL_PLATFORM_VENDOR : NVIDIA Corporation CL_PLATFORM_EXTENSIONS : cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_n3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_devicv_pragma_unroll Number of detected OpenCL devices: 2 GPU detected Device name is GeForce GTX 680 Device vendor is NVIDIA Corporation VENDOR ID: 0x10de Device max memory allocation: 512 mega-bytes Device global cacheline size: 128 bytes Device global mem: 2048 mega-bytes Maximum number of parallel compute units: 8 Maximum dimensions for global/local work-item IDs: 3 Maximum number of work-items in each dimension: ( 1024 1024 64 ) Maximum number of work-items in a work-group: 1024 GPU detected Device name is GeForce GTX 680 Device vendor is NVIDIA Corporation VENDOR ID: 0x10de Device max memory allocation: 512 mega-bytes Device global cacheline size: 128 bytes Device global mem: 2048 mega-bytes Maximum number of parallel compute units: 8 Maximum dimensions for global/local work-item IDs: 3 Maximum number of work-items in each dimension: ( 1024 1024 64 ) Maximum number of work-items in a work-group: 1024 CL_PLATFORM_PROFILE : FULL_PROFILE CL_PLATFORM_VERSION : OpenCL 1.2 CL_PLATFORM_NAME : Intel(R) OpenCL CL_PLATFORM_VENDOR : Intel(R) Corporation CL_PLATFORM_EXTENSIONS : cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_ntel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sharing cl_intl_khr_dx9_media_sharing cl_khr_d3d11_sharing Number of detected OpenCL devices: 1 CPU detected Device name is Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz Device vendor is Intel(R) Corporation VENDOR ID: 0x8086 Device max memory allocation: 8159 mega-bytes Device global cacheline size: 64 bytes Device global mem: 32639 mega-bytes Maximum number of parallel compute units: 8 Maximum dimensions for global/local work-item IDs: 3 Maximum number of work-items in each dimension: ( 1024 1024 1024 ) Maximum number of work-items in a work-group: 1024 |
Lovely.