Posted By
|
Message
|
jamesh
Registered 28/02/2012 15:24:25
Points 381
|
25th June, 2012 at 25/06/2012 11:41:20 -
Hello people! I'm working on an anagram checker, using the scrabble dictionary (pretty much a clone of something like this, http://www.wineverygame.com/ but without the wildcards, or prefix/suffix boxes).
My problem is that it works, but it's waaaay too slow. I think the problem is that it checks each possible combination of letters over the dictionary each time (the dictionary is 17,000 words long, so if e.g. you try to find anagrams of 'HELLO' it is doing 31*17,000 different checks, which is obviously inefficient - longer words add orders of magnitude)
I just can't think of another way to do it - has anyone got any suggestions? I've thought of only visiting the sections of the dictionary that start with letters in the word, but this seems like a bit of a kludge and I don't know currently how I'd implement that. Here is my code:
import itertools
f=open('scrabs2.txt', 'r') # Here we are loading the Dictionary
inmem=f.readlines() # from a text file into memory.
f.close # And closing the file
wordsort=sorted(raw_input('> ').upper()) # Taking input, making it uppercase, and making it into an alphabetically sorted list
out_list = []
out_list_2 = []
for i in range(1, len(wordsort)+1): # this is using something called itertools to create a list
out_list.extend(itertools.combinations(wordsort, i)) # of all possible combinations of the letters in the word.
out_list=[list(t) for t in out_list]
#The above and below lines are changing the data between a couple of formats, it's kind of redundant
for i in out_list:
out_list_2.append(''.join(i))
full_list=[]
for line in inmem: #OK, so here is where I think the problem is
currentstring=(line.upper()).rstrip() #This loop is iterating loads of times
currentsort=sorted(list(currentstring)) #Sorting each dictionary word into alphabetical order and
currentsort=str(''.join(currentsort)) #Comparing it to each possible combination of the original word
for i in out_list_2:
if currentsort==i:
full_list.append(currentstring)
full_list=list(set(full_list)) #This bit's just sorting and displaying the final list,
full_list=sorted(full_list, key=len) #It seems to take time though :s
full_list.reverse()
for i in full_list:
print i
So, TL;DR is that if anyone's got any suggestions, I'd be interested to hear them, as I'm kind of new to this whole thing
n/a
|
jamesh
Registered 28/02/2012 15:24:25
Points 381
|
25th June, 2012 at 25/06/2012 11:46:59 -
OK writing this all down has helped me think it out a little.
1) should I perhaps read directly from the file instead of transferring the data into a variable and closing it? Does anyone know if this would be faster/slower?
2) I've realised I'm not taking the duplicates out of the possible combinations beforehand, which would probably cut down on a lot of checking. duh!
n/a
|
Cecilectomy noPE
Registered 19/03/2005
Points 305
|
25th June, 2012 at 25/06/2012 20:38:45 -
Only looking in the sections starting with letters contained in the search is an obvious optimization.
If you know anything about data structures you might want to look into hashtables, binary trees, graphs, etc.
Concurrency is also always a good investment if done properly.
n/a
|
jamesh
Registered 28/02/2012 15:24:25
Points 381
|
25th June, 2012 at 25/06/2012 21:25:44 -
Hi the Cecilizer, can you briefly explain what concurrency is to me?
James
n/a
|
Cecilectomy noPE
Registered 19/03/2005
Points 305
|
25th June, 2012 at 25/06/2012 22:03:04 -
http://en.wikipedia.org/wiki/Concurrency_%28computer_science%29
In the most basic sense, it's using threads, to perform multiple tasks at the same time. Spinning off multiple threads to search the dictionary (shared resource) can reduce the time. Concurrency is a fairly difficult thing to do properly and efficiently though. Python does it well though, if you can learn how.
n/a
|
|
|