1 year ago
#355021
TSE Nathan
Use multiprocessing and multithreading to print on a image
I am facing a problem with multiprocessing and threading my program to fast the process.
My program take a list of point into an excel and create a gray scale image from this points.
The problem is I have a million points and it takes around 1 min to process. I am sure, there is a way to speed up the processing.
Here is the code without threading:
import os
import math
import json
import time
import pandas as pd
from PIL import Image
# FUNCTIONS
def CreateDataFrame(path, columns):
print('DataFrame creation ... ', end='')
with open(path) as excel_file:
lines = excel_file.read().splitlines()
np_array = []
for line in lines:
np_array.append(list(map(float, line.split(' '))))
print('done')
return pd.DataFrame(np_array, columns=columns)
def GetOffsets(df):
print('Getting offsets ... ', end='')
dict = {}
for c in df.columns:
dict[c] = min(df[c])
print('done')
return dict
def GetMaximums(df, offsets):
print('Getting Maximums ... ', end='')
max_dict = {}
for c in df.columns:
max_dict[c] = max(df[c]) - offsets[c]
print('done')
return max_dict
def CreateImage(maximums, scale = 1):
return Image.new('RGB', (int(maximums['x'] * scale) + 1, int(maximums['z'] * scale) + 1), color='black')
# MAIN
columns = ['x', 'z', 'y']
scale = 1
df = CreateDataFrame('raw data.csv', columns)
offsets = GetOffsets(df)
maximums = GetMaximums(df, offsets)
img = CreateImage(maximums, scale)
pixels = img.load()
print('Printing ... ', end='')
for i in range(len(df)):
line = df.iloc[i]
color = int(255 * (line['y'] - offsets['y']) / maximums['y'])
pixels[int((maximums['x'] - (line['x'] - offsets['x'])) * scale), int((line['z'] - offsets['z']) * scale)] = (color, color, color)
print('done')
img.save(f'terrain {scale}.png')
If you are interested, this is how it works. First, I create a dataframe from the excel and assign columns. Then, I get the minimum values of each columns to get my offset values. Once done, I do the same thing but with maximums. Thanks to those maximums, I can create an image with the maximum x and y values. Finally, I iterate into my dataframe to get the x and y for the position and gray scale the y value.
Now I am trying to multiprocess/thread it. To do that, I added this code:
import concurrent.futures
def Process(id, start, end):
start_time = time.time()
for i in range(start, end):
line = df.iloc[i]
color = int(255 * (line['y'] - offsets['y']) / maximums['y'])
pixels[int((maximums['x'] - (line['x'] - offsets['x'])) * scale), int((line['z'] - offsets['z']) * scale)] = (color, color, color)
print(f"Thread {id} ends: {time.time() - start_time}s")
nb_thread = 12
df_size = len(df)
nb_full_thread = df_size // nb_thread
thread_rest = df_size % nb_thread
with concurrent.futures.ThreadPoolExecutor(max_workers=nb_thread) as executor:
for i in range(nb_thread):
executor.submit(Process, i, i*nb_full_thread, (i+1) * nb_full_thread - 1)
print(f"Process {i} launched")
img.save(f'threading terrain {scale}.png')
Some explanations, nb_thread is the number of threads I want to create. Then, I get the number of line in my dataframe (df_size). This is usefull to determine how many lines, a thread will manage. Once done, I create my threads in order to they process the image and save the image.
With ThreadPoolExecutor, the program works but it takes the same amount of time as the previous version. And with ProcessPoolExecutor, the forloop in the Process function not orks as expected, it stops at the first value of the range.
I don't understand those behaviours, that is why I am turning to you.
I hope, it's clear enough, do not hesitate if you have a question.
python
multithreading
multiprocessing
threadpool
concurrent.futures
0 Answers
Your Answer