C++ TutorialBasic input/output in c++C++ AlignmentC++ Argument Dependent Name LookupC++ Arithmitic MetaprogrammingC++ ArraysC++ Atomic TypesC++ AttributesC++ autoC++ Basic Type KeywordsC++ Bit fieldsC++ Bit ManipulationC++ Bit OperatorsC++ Build SystemsC++ C incompatibilitiesC++ C++11 Memory ModelC++ Callable ObjectsC++ Classes/StructuresC++ Client server examplesC++ Common compile/linker errors (GCC)C++ Compiling and BuildingC++ Concurrency with OpenMPC++ Const CorrectnessC++ const keywordC++ Constant class member functionsC++ constexprC++ ContainersC++ Copy ElisionC++ Copying vs AssignmentC++ Curiously Recurring Template Pattern (CRTP)C++ Date and time using chrono headerC++ Debugging and Debug-prevention Tools & TechniquesC++ decltypeC++ Digit separatorsC++ EnumerationC++ ExceptionsC++ Explicit type conversionsC++ Expression templatesC++ File I/OC++ Floating Point ArithmeticC++ Flow ControlC++ Fold ExpressionsC++ Friend keywordC++ function call by value vs. call by referenceC++ Function OverloadingC++ Function Template OverloadingC++ Futures and PromisesC++ Header FilesC++ Implementation-defined behaviorC++ Inline functionsC++ Inline variablesC++ IterationC++ IteratorsC++ KeywordsC++ LambdasC++ Layout of object typesC++ Linkage specificationsC++ LiteralsC++ LoopsC++ Memory managementC++ MetaprogrammingC++ Move SemanticsC++ mutable keywordC++ MutexesC++ NamespacesC++ Non-Static Member FunctionsC++ One Definition Rule (ODR)C++ Operator OverloadingC++ operator precedenceC++ OptimizationC++ Overload resolutionC++ Parameter packsC++ Perfect ForwardingC++ Pimpl IdiomC++ PointersC++ Pointers to membersC++ PolymorphismC++ PreprocessorC++ ProfilingC++ RAII: Resource Acquisition Is InitializationC++ Random number generationC++ Recursive MutexC++ Refactoring TechniquesC++ ReferencesC++ Regular expressionsC++ Resource ManagementC++ Return Type CovarianceC++ Returning several values from a functionC++ RTTI: Run-Time Type InformationC++ Scopes

C++ Concurrency with OpenMP

From WikiOD

This topic covers the basics of concurrency in C++ using OpenMP. OpenMP is documented in more detail in the OpenMP tag.

Parallelism or concurrency implies the execution of code at the same time.

Remarks[edit | edit source]

OpenMP does not require any special headers or libraries as it is a built-in compiler feature. However, if you use any OpenMP API functions such as omp_get_thread_num(), you will need to include omp.h and its library.

OpenMP pragma statements are ignored when the OpenMP option is not enabled during compilation. You may want to refer to the compiler option in your compiler's manual.

  • GCC uses -fopenmp
  • Clang uses -fopenmp
  • MSVC uses /openmp

OpenMP: Parallel Sections[edit | edit source]

This example illustrates the basics of executing sections of code in parallel.

As OpenMP is a built-in compiler feature, it works on any supported compilers without including any libraries. You may wish to include omp.h if you want to use any of the openMP API features.

Sample Code

std::cout << "begin ";
//    This pragma statement hints the compiler that the
//    contents within the { } are to be executed in as
//    parallel sections using openMP, the compiler will
//    generate this chunk of code for parallel execution
#pragma omp parallel sections
    //    This pragma statement hints the compiler that
    //    this is a section that can be executed in parallel
    //    with other section, a single section will be executed
    //    by a single thread.
    //    Note that it is "section" as opposed to "sections" above
    #pragma omp section
        std::cout << "hello " << std::endl;
        /** Do something **/
    #pragma omp section
        std::cout << "world " << std::endl;
        /** Do something **/
//    This line will not be executed until all the
//    sections defined above terminates
std::cout << "end" << std::endl;


This example produces 2 possible outputs and is dependent on the operating system and hardware. The output also illustrates a race condition problem that would occur from such an implementation.

begin hello world end begin world hello end

OpenMP: Parallel Sections[edit | edit source]

This example shows how to execute chunks of code in parallel

std::cout << "begin ";
//    Start of parallel sections
#pragma omp parallel sections
    //    Execute these sections in parallel
    #pragma omp section
        ... do something ...
        std::cout << "hello ";
    #pragma omp section
        ... do something ...
        std::cout << "world ";
    #pragma omp section
        ... do something ...
        std::cout << "forever ";
//    end of parallel sections
std::cout << "end";


  • begin hello world forever end
  • begin world hello forever end
  • begin hello forever world end
  • begin forever hello world end

As execution order is not guaranteed, you may observe any of the above output.

OpenMP: Parallel For Loop[edit | edit source]

This example shows how to divide a loop into equal parts and execute them in parallel.

//    Splits element vector into element.size() / Thread Qty
//    and allocate that range for each thread.
#pragma omp parallel for
for    (size_t i = 0; i < element.size(); ++i)
    element[i] = ...

//    Example Allocation (100 element per thread)
//    Thread 1 : 0 ~ 99
//    Thread 2 : 100 ~ 199
//    Thread 2 : 200 ~ 299
//    ...

//    Continue process
//    Only when all threads completed their allocated
//    loop job
  • Please take extra care to not modify the size of the vector used in parallel for loops as allocated range indices doesn't update automatically.

OpenMP: Parallel Gathering / Reduction[edit | edit source]

This example illustrates a concept to perform reduction or gathering using std::vector and OpenMP.

Supposed we have a scenario where we want multiple threads to help us generate a bunch of stuff, int is used here for simplicity and can be replaced with other data types.

This is particularly useful when you need to merge results from slaves to avoid segement faults or memory access violations and do not wish to use libraries or custom sync container libraries.

//    The Master vector
//    We want a vector of results gathered from slave threads
std::vector<int> Master;    

//    Hint the compiler to parallelize this { } of code
//    with all available threads (usually the same as logical processor qty)
#pragma omp parallel
    //    In this area, you can write any code you want for each
    //    slave thread, in this case a vector to hold each of their results
    //    We don't have to worry about how many threads were spawn or if we need
    //    to repeat this declaration or not.
    std::vector<int> Slave;

    //    Tell the compiler to use all threads allocated for this parallel region
    //    to perform this loop in parts. Actual load appx = 1000000 / Thread Qty
    //    The nowait keyword tells the compiler that the slave threads don't
    //    have to wait for all other slaves to finish this for loop job
    #pragma omp for nowait
    for (size_t i = 0; i < 1000000; ++i
        /* Do something */

    //    Slaves that finished their part of the job
    //    will perform this thread by thread one at a time
    //    critical section ensures that only 0 or 1 thread performs
    //    the { } at any time
    #pragma omp critical
        //    Merge slave into master
        //    use move iterators instead, avoid copy unless
        //    you want to use it for something else after this section

//    Have fun with Master vector