« » 2025.4
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Python list의 +=와 append의 성능 분석 (보너스: +=와 extend 성능 분석)

Posted by epicdev Archive : 2012. 4. 6. 06:23

참고

http://stackoverflow.com/questions/725782/in-python-what-is-the-difference-between-append-and

http://markandclick.com/1/post/2012/01/python-list-append-vs.html

Python에서 list에 아이템을 더할 때 append나 += 를 수행해서 list에 항목을 추가 할 수 있다.

그런데 이 둘이 미묘한 차이가 있으므로 사용할 때 주의하여야 한다 (성능 최적화가 필요할 경우).

1

2

3

4

5

>>> import timeit
>>> timeit.Timer('s.append("text")', 's = []').timeit()
0.1486750112744442
>>> timeit.Timer('s += ["text"]', 's = []').timeit()
0.2929660228172146

이 결과는 어찌보면 당연한 것이다.

+= 연산의 경우 두 list를 합쳐서 새로운 list를 만드는 것이니, 당연히 느리다.

그런데 append의 경우 하나의 list에다가 항목을 추가하는 것이니, 당연히 += 연산을 이용하는 것보다 빠르다.

따라서, += 연산을 append 대신에 사용하는 것은 비효율적이라는 것을 알 수 있다.

좀 더 자세한 결과를 위해 Python bytecode를 disassembling 해보면 아래와 같은 결과를 얻을 수 있다.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

>>> import dis
>>> dis.dis(compile('s = []; s.append("text")', '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('text')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile('s = []; s += ["text"]', '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('text')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

append의 경우 LOAD_ATTR + CALL_FUNCTION을 사용하는 것이고

+=의 경우 BUILD_LIST를 사용하게 된다.

당연히 LOAD_ATTR + CALL_FUNCTION보다 BUILD_LIST가 느린 것은 당연한 것이므로 append가 더 빠르다.

여기서 LOAD_ATTR을 미리 처리해서 append의 속도를 더 빠르게 만들 수도 있다.

1 2	`>>> timeit.Timer('app("text")',` `'s = []; app = s.append').timeit()` `0.11080928452759053`

같은 함수를 반복 호출할 경우 이런방식으로 사용한다면 좀 더 효율적인 코드를 짤 수 있을것이다.

결론:

list에 항목을 하나만 추가할 때는 +=보다는 append를 사용하고

list들을 이을(concatenation) 때만 +=나 extend를 사용하자

(list들을 이을 때, +=와 extend의 속도차이는 아래의 글을 참고하자. 참고로 list들을 이을 때는 +=가 extend보다 더 빠르다.)

http://stackoverflow.com/questions/4176980/is-extend-faster-than

'Archive' 카테고리의 다른 글

first/last와 begin/end (0)	2012.05.11
가독성의 기본 정리 (The Fundamental Theorem of Readability) (0)	2012.05.11
Eclipse의 Runnable JAR File Export의 Library Handling 옵션 설명 (0)	2012.04.05
ls -l 명령어로 파일 크기 확인할 때 1M 단위로 보기 (0)	2012.03.29
리팩토링을 망설이지 말자 (0)	2012.03.29

Python(파이썬)으로 httplib과 urllib을 사용해서 GET하기

Posted by epicdev Archive : 2012. 2. 5. 15:22

1

2

3

4

5

6

7

8

9

10

11

12

13

# -*- coding: utf-8 -*-
 
import httplib, urllib
 
params = urllib.urlencode({'query': '네이버 검색 테스트'})
conn = httplib.HTTPConnection("search.naver.com")
conn.request('GET', '/search.naver?' + params)
response = conn.getresponse()
print response.status, response.reason
 
data = response.read()
print data
conn.close()

'Archive' 카테고리의 다른 글

PrintWriter와 BufferedWriter 중 무엇을 사용할까? (1)	2012.02.11
Java heap의 세가지 영역 (0)	2012.02.11
HeadFirst Javascript (헤드퍼스트 자바스크립트) AJAX 예제 Youcube 변형한 것 (0)	2012.01.31
Java에서 중첩 루프 한번에 탈출 하는법 (0)	2012.01.26
Encapsulate Downcast (다운캐스트의 캡슐화) (0)	2012.01.25

파이썬의 Indentation에 관한 미신들

Posted by epicdev Archive : 2011. 12. 3. 15:25

출처: http://www.secnetix.de/olli/Python/block_indentation.hawk

Python: Myths about Indentation

Note: Lines beginning with ">>>" and "..." indicate input to Python (these are the default prompts of the interactive interpreter). Everything else is output from Python.

There are quite some prejudices and myths about Python's indentation rules among people who don't really know Python. I'll try to address a few of these concerns on this page.

"Whitespace is significant in Python source code."

No, not in general. Only the indentation level of your statements is significant (i.e. the whitespace at the very left of your statements). Everywhere else, whitespace is not significant and can be used as you like, just like in any other language. You can also insert empty lines that contain nothing (or only arbitrary whitespace) anywhere.

Also, the exact amount of indentation doesn't matter at all, but only the relative indentation of nested blocks (relative to each other).

Furthermore, the indentation level is ignored when you use explicit or implicit continuation lines. For example, you can split a list across multiple lines, and the indentation is completely insignificant. So, if you want, you can do things like this:

>>> foo = [
...            'some string',
...         'another string',
...           'short string'
... ]
>>> print foo

['some string', 'another string', 'short string']

>>> bar = 'this is ' \
...       'one long string ' \
...           'that is split ' \
...     'across multiple lines'
>>> print bar

this is one long string that is split across multiple lines

"Python forces me to use a certain indentation style."

Yes and no. First of all, you can write the inner block all on one line if you like, therefore not having to care about intendation at all. The following three versions of an "if" statement are all valid and do exactly the same thing (output omitted for brevity):

>>> if 1 + 1 == 2:
... print "foo"
... print "bar"
... x = 42

>>> if 1 + 1 == 2:
... print "foo"; print "bar"; x = 42

>>> if 1 + 1 == 2: print "foo"; print "bar"; x = 42

Of course, most of the time you will want to write the blocks in separate lines (like the first version above), but sometimes you have a bunch of similar "if" statements which can be conveniently written on one line each.

If you decide to write the block on separate lines, then yes, Python forces you to obey its indentation rules, which simply means: The enclosed block (that's two "print" statements and one assignment in the above example) have to be indented more than the "if" statement itself. That's it. And frankly, would you really want to indent it in any other way? I don't think so.

So the conclusion is: Python forces you to use indentation that you would have used anyway, unless you wanted to obfuscate the structure of the program. In other words: Python does not allow to obfuscate the structure of a program by using bogus indentations. In my opinion, that's a very good thing.

Have you ever seen code like this in C or C++?

/* Warning: bogus C code! */

if (some condition)
if (another condition)
do_something(fancy);
else
this_sucks(badluck);

Either the indentation is wrong, or the program is buggy, because an "else" always applies to the nearest "if", unless you use braces. This is an essential problem in C and C++. Of course, you could resort to always use braces, no matter what, but that's tiresome and bloats the source code, and it doesn't prevent you from accidentally obfuscating the code by still having the wrong indentation. (And that's just a very simple example. In practice, C code can be much more complex.)

In Python, the above problems can never occur, because indentation levels and logical block structure are always consistent. The program always does what you expect when you look at the indentation.

Quoting the famous book writer Bruce Eckel:

Because blocks are denoted by indentation in Python, indentation is uniform in Python programs. And indentation is meaningful to us as readers. So because we have consistent code formatting, I can read somebody else's code and I'm not constantly tripping over, "Oh, I see. They're putting their curly braces here or there." I don't have to think about that.

"You cannot safely mix tabs and spaces in Python."

That's right, and you don't want that. To be exact, you cannot safely mix tabs and spaces in C either: While it doesn't make a difference to the compiler, it can make a big difference to humans looking at the code. If you move a piece of C source to an editor with different tabstops, it will all look wrong (and possibly behave differently than it looks at first sight). You can easily introduce well-hidden bugs in code that has been mangled that way. That's why mixing tabs and spaces in C isn't really "safe" either. Also see the "bogus C code" example above.

Therefore, it is generally a good idea not to mix tabs and spaces for indentation. If you use tabs only or spaces only, you're fine.

Furthermore, it can be a good idea to avoid tabs alltogether, because the semantics of tabs are not very well-defined in the computer world, and they can be displayed completely differently on different types of systems and editors. Also, tabs often get destroyed or wrongly converted during copy&paste operations, or when a piece of source code is inserted into a web page or other kind of markup code.

Most good editors support transparent translation of tabs, automatic indent and dedent. That is, when you press the tab key, the editor will insert enough spaces (not actual tab characters!) to get you to the next position which is a multiple of eight (or four, or whatever you prefer), and some other key (usually Backspace) will get you back to the previous indentation level.

In other words, it's behaving like you would expect a tab key to do, but still maintaining portability by using spaces in the file only. This is convenient and safe.

Having said that -- If you know what you're doing, you can of course use tabs and spaces to your liking, and then use tools like "expand" (on UNIX machines, for example) before giving the source to others. If you use tab characters, Python assumes that tab stops are eight positions apart.

"I just don't like it."

That's perfectly OK; you're free to dislike it (and you're probably not alone). Granted, the fact that indentation is used to indicate the block structure might be regarded as uncommon and requiring to get used to it, but it does have a lot of advantages, and you get used to it very quickly when you seriously start programming in Python.

Having said that, you can use keywords to indicate the end of a block (instead of indentation), such as "endif". These are not really Python keywords, but there is a tool that comes with Python which converts code using "end" keywords to correct indentation and removes those keywords. It can be used as a pre-processor to the Python compiler. However, no real Python programmer uses it, of course.
[Update] It seems this tool has been removed from recent versions of Python. Probably because nobody really used it.

"How does the compiler parse the indentation?"

The parsing is well-defined and quite simple. Basically, changes to the indentation level are inserted as tokens into the token stream.

The lexical analyzer (tokenizer) uses a stack to store indentation levels. At the beginning, the stack contains just the value 0, which is the leftmost position. Whenever a nested block begins, the new indentation level is pushed on the stack, and an "INDENT" token is inserted into the token stream which is passed to the parser. There can never be more than one "INDENT" token in a row.

When a line is encountered with a smaller indentation level, values are popped from the stack until a value is on top which is equal to the new indentation level (if none is found, a syntax error occurs). For each value popped, a "DEDENT" token is generated. Obviously, there can be multiple "DEDENT" tokens in a row.

At the end of the source code, "DEDENT" tokens are generated for each indentation level left on the stack, until just the 0 is left.

Look at the following piece of sample code:

>>> if foo:
... if bar:
... x = 42
... else:
... print foo
...

In the following table, you can see the tokens produced on the left, and the indentation stack on the right.

<if> <foo> <:> [0]
<INDENT> <if> <bar> <:> [0, 4]
<INDENT> <x> <=> <42> [0, 4, 8]
<DEDENT> <DEDENT> <else> <:> [0]
<INDENT> <print> <foo> [0, 2]
<DEDENT> [0]

Note that after the lexical analysis (before parsing starts), there is no whitespace left in the list of tokens (except possibly within string literals, of course). In other words, the indentation is handled by the lexer, not by the parser.

The parser then simply handles the "INDENT" and "DEDENT" tokens as block delimiters -- exactly like curly braces are handled by a C compiler.

The above example is intentionally simple. There are more things to it, such as continuation lines. They are well-defined, too, and you can read about them in the Python Language Reference if you're interested, which includes a complete formal grammar of the language.

'Archive' 카테고리의 다른 글

Javadoc에 @link로 링크걸기 (0)	2012.01.15
정말로 Date를 쓰지않고 Calendar를 써야하는가? (0)	2011.12.22
시니어 프로그래머, 행복한 프로그래밍 (3) (0)	2011.12.03
시니어 프로그래머, 행복한 프로그래밍 (2) (0)	2011.12.03
시니어 프로그래머, 행복한 프로그래밍 (1) (0)	2011.12.03

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

프로그래밍로그

공지사항

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

달력

'Python'에 해당되는 글 3건

Python list의 +=와 append의 성능 분석 (보너스: +=와 extend 성능 분석)

'Archive' 카테고리의 다른 글

Python(파이썬)으로 httplib과 urllib을 사용해서 GET하기

'Archive' 카테고리의 다른 글

파이썬의 Indentation에 관한 미신들

Python: Myths about Indentation

"Whitespace is significant in Python source code."

"Python forces me to use a certain indentation style."

"You cannot safely mix tabs and spaces in Python."

"I just don't like it."

"How does the compiler parse the indentation?"

'Archive' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역