CudaMiner Mining Kernel Configuration Guide

People often get confused about the kernel launch config on CUDA Miner and start putting random numbers in. So, this guide is to help you understand what you should put in the “-l” argument on CUDA Miner!

To begin with, you need to pass 3 values in this argument, the first being which kernel you’ll use for your card, the second is the number of SM(or SMX) your card has, and the 3rd and last value is the number of warps per SM(or SMX) your card is limited to.


BEFORE YOU READ: This guide is only valid for the newest version of cudaminer!(2013-12-18)


First value: Kernel = “-l (K)5×32″

You can easily find what your card achitecture is by running CUDA Miner in autotune mode, by removing the “-l” argument or using it’s value as “-l auto” and see what was reported.

You can either find it manually by searching your card’s compute version and using the right one for your card’s compute version in this link.

LLegacy cards with compute 1.x

SCurrently compiled for compute 1.2. Was used for Kepler cards but was replaced by “K”

FFermi cards with compute 2.x

KKepler cards with compute 3.0

TFor compute 3.5 cards such as Titan, GTX 780 and GK208 based

XExperimental kernel. Currently requires compute 3.5


Second value: SM(or SMX) units = “-l K(5)x32″

Use this link to find how many SM(or SMX) units your card has.

If there are multiple versions of your card, use GPU-Z or NVIDIA Inspector to see what is the name and revision of your GPU and compare to the ones on the wiki. You can also compare Memory/Core Clocks.

If your card doesn’t have the number of SMs specified, calculate it manually by doing the math with the number of SM per unit of Stream Processors. In the wiki they are displayed as the first number on the “Core Config” column. Example: GTX 660 has the Core Config “960:80:24” with 960 Stream Processors. Using the table below, divide this by 192, which gives 5 SMX.

Compute 1.0 and 1.1: 2 SFUs per unit of 8 Stream Processors.

Compute 1.2 and 1.3: 1 SFU per unit of 8 Stream Processors.

Compute 2.0: 1 SM per unit of 32 Stream Processors.

Compute 2.1: 1 SM per unit of 48 Stream Processors.

Compute 3.0 and 3.5: 1 SMX per unit of 192 Stream Processors.


Third value: Warps per SM(or SMX) unit = “-l K5x(32)

Compute 1.x cards are limited to [8] warps per SFU unit.

Compute 2.x cards are limited to [16] warps per SM unit. (Double-pumped process)

Compute 3.x cards are limited to [32] warps per SMX unit. (Quad-pumped process)


FERMI USERS: Test your values reversed to see what gives you the best results. Example: “F4x16”, test with “F16x4”. As long as you stay with multiples, it’s fine.


Examples:

9800 GTX = “-l L32x8” = Legacy card (Compute 1.0), 32 Special Function Units, 8 warps per SFU

GTX 570 = “-l F15x16” = Fermi card (Compute 2.0), 15 Streaming Multiprocessors, 16 warps per SM

GTX 660 = “-l K5x32” = Kepler card (Compute 3.0), 5 Next-Gen Streaming Multiprocessors, 32 warps per SMX

GTX Titan = “-l T14x32” = Titan card (Compute 3.5), 14 Next-Gen Streaming Multiprocessors, 32 warps per SMX


My config as example:

cudaminer -r 10 -R 30 -T 30 -H 1 -i 0 -m 1 -d 0 -l K5x32 –no-autotune –url stratum+tcp://stratum.miningpool.ofchoice:1234 -u Username.Worker -p Password


.: Notes :.

I don’t have any legacy of fermi cards for testing. The SFU/warps count should make sense.

If you test it and it doesn’t work, try “-l auto”, or try running the benchmark tool on CUDA Miner to see what’s the best you can get: Create a new .bat file with this line in “cudaminer -D –benchmark”.

.: Tips :.

Tip 1: Cards with compute 1.2 may experience better hashrates with the “S” kernel prefix.

Tip 2: Cards with compute 2.1 and below may experience better hashrates using the 32bit version of cudaminer.

Tip 3: Cards with compute 3.x ignores the “-C” flag. Compute 2.1 and below may experience better hashrates with “-C 1” rather than “-C 2”.

Tip 4: The “-H” flag determines how much your CPU will help your GPU. If you are not mining with both GPU and CPU, the values of “0” and “1” should give you some more kh/s. “0” is singlethreaded help, “1” is multithreaded help, and “2” gives all the work to the GPU.


Thanks to:

stkris for helping me figure out how the Fermi occupancy calculation works by testing lots of numbers with his Fermi card! 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s