The Daily Click ::. Forums ::. Non-Klik Coding Help ::. Optimising Code - Python this time
 

Post Reply  Post Oekaki 
 

Posted By Message

jamesh



Registered
  28/02/2012 15:24:25
Points
  381
25th June, 2012 at 25/06/2012 11:41:20 -

Hello people! I'm working on an anagram checker, using the scrabble dictionary (pretty much a clone of something like this, http://www.wineverygame.com/ but without the wildcards, or prefix/suffix boxes).

My problem is that it works, but it's waaaay too slow. I think the problem is that it checks each possible combination of letters over the dictionary each time (the dictionary is 17,000 words long, so if e.g. you try to find anagrams of 'HELLO' it is doing 31*17,000 different checks, which is obviously inefficient - longer words add orders of magnitude)

I just can't think of another way to do it - has anyone got any suggestions? I've thought of only visiting the sections of the dictionary that start with letters in the word, but this seems like a bit of a kludge and I don't know currently how I'd implement that. Here is my code:

import itertools


f=open('scrabs2.txt', 'r') # Here we are loading the Dictionary
inmem=f.readlines() # from a text file into memory.
f.close # And closing the file

wordsort=sorted(raw_input('> ').upper()) # Taking input, making it uppercase, and making it into an alphabetically sorted list

out_list = []
out_list_2 = []
for i in range(1, len(wordsort)+1): # this is using something called itertools to create a list
out_list.extend(itertools.combinations(wordsort, i)) # of all possible combinations of the letters in the word.

out_list=[list(t) for t in out_list]

#The above and below lines are changing the data between a couple of formats, it's kind of redundant

for i in out_list:
out_list_2.append(''.join(i))

full_list=[]

for line in inmem: #OK, so here is where I think the problem is
currentstring=(line.upper()).rstrip() #This loop is iterating loads of times
currentsort=sorted(list(currentstring)) #Sorting each dictionary word into alphabetical order and
currentsort=str(''.join(currentsort)) #Comparing it to each possible combination of the original word
for i in out_list_2:
if currentsort==i:
full_list.append(currentstring)

full_list=list(set(full_list)) #This bit's just sorting and displaying the final list,
full_list=sorted(full_list, key=len) #It seems to take time though :s
full_list.reverse()

for i in full_list:
print i


So, TL;DR is that if anyone's got any suggestions, I'd be interested to hear them, as I'm kind of new to this whole thing

 
n/a

jamesh



Registered
  28/02/2012 15:24:25
Points
  381
25th June, 2012 at 25/06/2012 11:46:59 -

OK writing this all down has helped me think it out a little.

1) should I perhaps read directly from the file instead of transferring the data into a variable and closing it? Does anyone know if this would be faster/slower?

2) I've realised I'm not taking the duplicates out of the possible combinations beforehand, which would probably cut down on a lot of checking. duh!

 
n/a

Cecilectomy

noPE

Registered
  19/03/2005
Points
  305

Has Donated, Thank You!VIP MemberWeekly Picture Me This Winner!Cardboard BoxGhostbuster!Pokemon Ball!ComputerBox RedSanta HatSnowman
I am an April Fool
25th June, 2012 at 25/06/2012 20:38:45 -

Only looking in the sections starting with letters contained in the search is an obvious optimization.

If you know anything about data structures you might want to look into hashtables, binary trees, graphs, etc.

Concurrency is also always a good investment if done properly.

 
n/a

jamesh



Registered
  28/02/2012 15:24:25
Points
  381
25th June, 2012 at 25/06/2012 21:25:44 -

Hi the Cecilizer, can you briefly explain what concurrency is to me?

James

 
n/a

Cecilectomy

noPE

Registered
  19/03/2005
Points
  305

Has Donated, Thank You!VIP MemberWeekly Picture Me This Winner!Cardboard BoxGhostbuster!Pokemon Ball!ComputerBox RedSanta HatSnowman
I am an April Fool
25th June, 2012 at 25/06/2012 22:03:04 -

http://en.wikipedia.org/wiki/Concurrency_%28computer_science%29

In the most basic sense, it's using threads, to perform multiple tasks at the same time. Spinning off multiple threads to search the dictionary (shared resource) can reduce the time. Concurrency is a fairly difficult thing to do properly and efficiently though. Python does it well though, if you can learn how.

 
n/a
   

Post Reply



 



Advertisement

Worth A Click