Home
Home Page
And again about functional programming on Python
Where the Python has come crawling?
Practical application OOP in PHP5
Functional programming in language Python
Approaches of language Python - an amusing example of optimization
Use ext/mysqli: Part I - the Review and the prepared expressions
How to receive the maximal result from registration in catalogues
XML-RPC In language Python
The API-specification of databases of language Python, version 2.0
Programming of the Web-client in language Python
Useful advice for job with XML
Future Web - behind semantics
Whether it is necessary to cancel spaces of names XML?
Use AJAX in ASP.NET
ASP .NET 2.0: Reference pages
Patterns of registration
What is ASP.NET. Installation and the test project.
Anatomy ASP.NET. ASP.NET in operation.
Server elements of management Continuation.
Links
 
topic



Approaches of language Python - an amusing example of optimization

Once the friend has set to me, apparently, a simple question: how in the best way to transform the list of integers to a line, assuming, that these integers are submitted in format ASCII. For example, the list [97, 98, 99] should be transformed into a line 'abc'. We admit{allow}, that we want to write for this purpose some function.


The first variant which I have thought up, was frankly simple:



def f1 (list):

string = " "

for item in list:

string = string + chr (item)

return string


" This way cannot be the fastest, " - my friend has said. As about such:



def f2 (list):

return reduce (lambda string, item: string + chr (item), list, " ")


This variant carries out in accuracy the same set of line operations, as previous, but here we have refused an overhead charge for performance of a cycle for for the benefit of as it was supposed, faster implicit cycle of function reduce ().


" Certainly, " - I have answered, - " but he does{makes} all this at the expense of a call of function (lambda) for each element of the list. I can argue, what is it more slowly, in fact the call of function in language Python demands the big overhead charge, than a cycle ".


(it is Good, I have already compared before it: f2 () has demanded on 60 % of more time, than f1 (). So:-)


" KHmm, " - has said my familiar - " it is necessary for me, that all this worked faster ". "Well", - I have said, - " as about such version ":



def f3 (list):

string = " "

for character in map (chr, list):

string = string + character

return string


To mutual surprise, f3 () appeared twice faster, than f1 ()! The reason of our surprise was double: first, in this case it was used the greater memory size (result of a call map (chr, list) is represented with one more list of the same length); second, here contained two cycles instead of one (that it is made in function map (), and a cycle for).


Certainly, the space at the expense of time is a known compromise so the first should not surprise us. And still as two cycles appeared faster, than one? To this there are two reasons.


First, in f1 () built - in functions chr () it is searched at each iteration, whereas in f3 () it is done{made} only once (at processing argument map ()). " Search is rather expensive{dear}, " - I have said to the friend " as rules of dynamic area of names Python assume search all over again in the general{common} dictionary of the current module (where he appears unsuccessful), and then in the dictionary of the built - in functions (where find required). Worse that unsuccessful dictionary search on the average hardly more slowly, than successful, because of features chained khehshirovanija. "


The second reason of that f3 () is faster, than f1 (), consists that the reference{manipulation} to chr (item), carried out by the interpreter of a byte - code, probably, hardly more slowly than when it is done{made} by function map () - the interpreter of a byte - code should out three byte - code instructions for each reference{manipulation} (load 'chr', load 'item', call), whereas function map () does{makes} all this on C.


All this has led us to to idea of the compromise which would allow to not spend superfluous space, but would speed up search of function chr ():



def f4 (list):

string = " "

lchr = chr

for item in list:

string = string + lchr (item)

return string


Predictably, f4 () appeared more slowly, than f3 (), but only on 25 %; and still approximately on 40 % was faster, than f1 (). So it has turned out because search of local variables occurs faster, than search of the global or built - in variables: "compiler" of language Python optimizes the most part of bodies of functions in such a manner that local variables do not need search in the dictionary, and enough to simple indexation of a file. Relative speed f4 () in comparison with f1 () and f3 () assumes existence of both reasons of fast job of function named above f3 (), however the first reason (smaller number of searches) hardly is more important. (to obtain more exact data on it, we should vkompilirovat` corresponding opportunities in the interpreter).


Nevertheless, our best variant, f3 (), till now only was twice faster, than the most rectilinear decision, f1 (). Whether we could improve something?


I worried, that the square-law behaviour of algorithm of us will ruin. Till now we used as the test data the list from 256 integers as considered{examined} function was necessary for this purpose my familiar and. But what would be if to apply her{it} to the list from two thousand symbols? We would connect more and more long lines, on one symbol for time. It is easy to see, that, besides an overhead charge to create thus the list in length N, it is necessary to copy 1 + 2 + 3 +... + (N-1) symbols, or N * (N-1)/2, or 0.5*N ** 2 - 0.5*N. Besides exists N operations of allocation of a line, however for enough big N an element containing N ** 2, surpasses on an overhead charge these operations. Really, for the list in 2048 elements which in 8 times is longer test, any of these functions is carried out more slowly much more, than in 8 times; actually, is closer to 16. I have not dared to try the list, in 64 times is longer test.


To avoid square-law behaviour of similar algorithms, there is a general{common} approach. I have written down it{him} for the lines consisting precisely from 256 elements:



def f5 (list):

string = " "

for i in range (0, 256, 16): * 0, 16, 32, 48, 64...

s = " "

for character in map (chr, list [i:i+16]):

s = s + character

string = string + s

return string


Unfortunately, for the list from 256 elements this version worked little bit more slowly (though and no more than on 20 %), than f3 (). As the spelling of the general{common} version could slow down only process, we did not begin to worry about further to consider{examine} this direction (except that all of us have compared it{him} with a variant which is not using map () which, certainly, again appeared more slowly).


Eventually I have tried considerably distinct from previous the approach: to use only implicit cycles. Notice, that all operation as a whole can be described as follows: to apply chr () to each of elements of the list; then konkatenirovat` the received symbols. We already used an implicit cycle for the first part: map (). Fortunately, in the line module is a little realized on From functions konkatenacii lines. In particular, string.joinfields (list_of_strings, delimiter) konkateniruet the list of lines, placing the set separator between everyone in two lines. Nothing prevented us to make konkatenaciju the list of symbols (which only lines of individual length in language Python), using an empty line as a separator. Listen and look:



import string

def f6 (list):

return string.joinfields (map (chr, list), " ")


This function worked in four - five times faster, than fastest of its{her} main contenders, f3 (). Moreover, it{she} does not have square-law behaviour of former versions.


And the winner became...


Next day I have recollected one feature of language Python: the module array. This module has operations of creation of a file of one-byte integers from the list of integers of language Python, and each file can be written down in a file or is transformed into a line as binary structure of the data. Our function realized with use of these operations:



import array

def f7 (list):

return array.array ('b', list) .tostring ()


It is already three times faster, than f6 (), or at 12-15 time is faster than f3 ()! The smaller volume intermediate memories is here too used - two objects from N byte (plus the fixed overhead charge) whereas f6 () starts with allocation of the list from N elements which usually costs{stands} 4N byte (8N byte by the 64-bit machine) - in the assumption are allocated only, that symbolical objects are divided{shared} with similar objects everywhere in the program (as for small integers, in most cases Python kehshiruet lines of individual length).


" Stop, " - has said my familiar - " we shall stop, while we have not received negative execution time - such speed quite enough for my program ". I have agreed, though I would like to try one more approach: to write all function on S.Eto would help to minimize needs{requirements} for a memory size (she will allocate at once a line of length N) and to save some superfluous instructions in a code on With which as I knew, were in the module array as he is more universal (supports integers in width in 1, 2 and 4 bajta). Nevertheless, thus it would not be possible to avoid necessity to take elements from the list on one, and also to receive from them integers With: both these operations are expensive enough in Python-C API, so I assumed at the best small acceleration in comparison with f7 (). Taking into account efforts which would be necessary for spending on a spelling and testing of expansion (in comparison with nabrasyvaniem pairs lines on Python), and dependence on non-standard expansion Python, I have decided to not investigate this variant...

Conclusion


If speed is extremely necessary for you, address to the built - in functions - you cannot create a cycle better, than written on S.Poishchite the functions doing{making} that it is necessary for you, in a library management{manual} on the built - in functions. If there there is no such function, some principles of optimization of cycles below are resulted:


* Corrected number{room} one: optimize only obviously critical sites. Optimize only the most internal cycle. (This rule exists irrespective of language Python, but harmlessly to repeat it{him} as it can save a lot of job.:-)

* Small it is beautiful. Taking into account a high overhead charge for byte - code instructions and search of variables, seldom it is necessary to add in a code additional checks for economy of small job.

* Use the built - in operations. The implicit cycle in map () works faster, than an obvious cycle for; the cycle while with the obvious counter of a cycle works even more slowly.

* Avoid a call of the functions written on Python, in an internal cycle. It concerns and to lambda. Development{display} (inlining) an internal cycle can save a lot of time.

* Local variables are processed faster, than global; if you use a global constant in a cycle, copy her{it} in a local variable before a cycle. Besides in language Python of function (global or built - in) also are global constants!

* Try to use map (), filter () or reduce () for replacement of an obvious cycle for but only if you can use the built - in function: map () with the built - in function is faster for, however for with developed{unwrapped} (in-line) a code is faster, than map () with lambda-function!

* Check up your algorithm on square-law behaviour. But notice, that more complex{difficult} algorithm pays off only in case of large values N - for small N complexity does not pay off. In our case 256 appeared small enough value that more simple version of function remained the fastest. Your expenses in different cases can differ is it is necessary to investigate.

* And the last on a mention, but not on value: collect the data. The magnificent module of profiling of language Python can quickly show to you a bottleneck in your code. If you compare different versions of some algorithm, test it{him} in a short cycle, using function time.clock ().


By the way, function of timing which was used by me. She causes function f n*10 time with argument a, and prints a name of function and a trace - time which she has fulfilled, approximated up to milliseconds. 10 repeated calls are done{made} for minimization of an overhead charge of the function of timing. You can go even further and to make 100 calls... Notice also, that expression range (n) pays off outside frameworks of measurement - other reception of minimization of the charges brought by function of timing. If you are concerned with these charges, you can otkalibrovat` them with the help of a call of function of timing with empty (nothing doing{making}) function.



import time

def timing (f, n, a):

print f. __ name __,

r = range (n)

t1 = time.clock ()

for i in r:

f (a); f (a); f (a); f (a); f (a); f (a); f (a); f (a); f (a); f (a)

t2 = time.clock ()

print round (t2-t1, 3)


Epilogue


In the several days later my friend has again addressed to me with a question: how you will make return operation? I.e. creation of the list of ASCII-codes from a line. About no, we and again here, has flown at me in a head...


But this time all appeared rather painless. There were two candidates, obvious:



def g1 (string):

return map (ord, string)


And a little less obvious:



import array

def g2 (string):

return array.array ('b', string) .tolist ()


Definition of time of their job has shown, that g2 () approximately is five times faster, than g1 (). Though there was also a trap: g2 () returned integers in an interval-128..127, whereas g1 () returned integers in an interval 0..255. If positive numbers are necessary for you, g1 () will be faster, than any postprocessing of the results received with the help g2 (). (the Note: since this essay has been written, in the module array the code such as 'B' for bezznakovykh byte has been added, thus now there are no bases for preference of function g1 ()) more.