Duke Researchers Pit Computer Against Human Crossword Puzzle Players
From Duke News
By Monte Basgall
DURHAM, N.C. (April 19, 1999)- After The New York Times crossword puzzle editor Will Shortz proclaimed that machines could never match humans in solving crossword puzzles, Duke University computer scientist Michael Littman said his team "rose to the challenge."
Littman, Duke doctoral students Greg Keim and Noam Shazeer, and other students and faculty designed "Proverb," a crossword solving program that achieved an average of 95.3 percent words correct and 98.1 percent letters correct when given a sample of 370 crosswords. The puzzles, chosen randomly from The New York Times and several other sources, were each solved in under 15 minutes.
"Computer honor was being questioned," added Littman, an assistant professor of computer science, in an interview.
Littman’s team described these results in a research paper to be presented in July at the annual conference of the American Association of Artificial Intelligence in Orlando, Fla.
In past years, the conference has included competitions pitting computers against humans in games such as chess, backgammon, and Scrabble. "And we thought crossword puzzles would be a very exciting addition," Littman said.
Proverb has already been tested against human opponents, tackling the puzzles used in the 22nd Annual American Crossword Puzzle Tournament held March 12-14 in Stamford, Conn. and directed by Shortz.
The competition included 254 human competitors. Had Proverb actually competed, its score computed according to a formula that combines accuracy and speed, would have put it in 147th place.
"On one puzzle, Proverb had the 20th highest score," said Littman. "But on another, particularly nasty puzzle, Proverb only got three words correct and came in 251st place.
"However, this tough puzzle used the trick that all the clues were ‘Spoonerized’ -- for example, instead of the clue ‘Nome is here’ for ALASKA, the puzzle used ‘Home is near’," said Littman. "All 78 clues were transformed in the same way. When these ‘errors’ were corrected by hand, Proverb was able to solve the puzzle perfectly, putting it solidly in the top half of competitors at the tournament."
Shortz expressed admiration for Proverb’s results, posting them on the tournament’s web site with the comment: "The results were so interesting (in fact, so amazing) that I printed them out on large sheets of paper and posted them, along with Michael’s analysis, after each round at the event."
Shortz, who provides the clues on a National Public Radio Sunday puzzle contest, besides serving as the Times’ crossword puzzle editor, first issued his challenge to computer designers in 1997 after world champion chess master Garry Kasparov was defeated by an IBM chess playing computer program, Deep Blue [www.chess.ibm.com].
Shortz contended that crossword puzzles, unlike chess, draw on particular human skills and thought processes that can be inaccessible to computing machines.
"Is the computer going to be able to solve the clues involving puns and wordplay? I don't think so," Shortz wrote in an introduction to a volume of The New York Times daily crossword puzzles. He gave examples of clues he felt computers would miss, such as "Event that produces big bucks" referring to RODEO; "Pumpkin-colored" translating as ORANGE; or "It might have quarters downtown" meaning METER.
Computers loaded with the most encyclopedic databases of facts, including volumes of solutions to past crossword puzzles, would be apt to miss clues like those, Shortz suggested.
"So if you were one who lamented the loss of the human to the computer in chess, don't despair," Shortz wrote. "In a much more wide ranging and, frankly, complex game like crosswords, we humans still rate just fine."
In their research paper, Littman’s team acknowledged that Proverb is still "not competitive with the very best human solvers." But "the level of success we achieved would probably not have been possible five years ago," they added. "We depended on extremely fast computers with vast memory and disk storage, and used tremendous amounts of data in machine-readable form."
"Perhaps the time is ripe to use these resources to attack other problems previously deemed too challenging for AI (artificial intelligence)."
The name Proverb comes from "probabilistic cruciverbalist," meaning a crossword solver based on probability theory. It uses several different and independently communicating programs, written in four different computer languages: Java, Perl, C and C++.
"Just like with computerized chess playing, we ended up not mimicking the way humans do things," Littman said. "When a human solves a crossword puzzle, there is this back and forth process. The human looks at a clue, writes in a potential answer, then looks at the information in the crossing grid to see what other words might fit. So it’s back and forth between the grid and the clues.
Proverb initially analyzes the word and letter clues separately from the across-and-down puzzle grids on which the answers must be arranged. Then it embarks on a detailed search for the best solutions that also fit the grid layouts.
A Proverb component called the "Coordinator" separates clues from grids, then sends copies of the clues to 30 different "Expert Modules" running on as many as 14 different computers.
As Littman described them, "a module is a piece of code that takes a clue as input and spits out a list of candidates as output. It is a lot like a World Wide Web search engine," he said, "where for every typed-in query the engine then lists documents that might be relevant.
"Each module is a different large database, some crammed with facts on subjects like music, geography, movies, writers, quotations, and synonyms. Other modules store information on word relationships.
"Also importantly, a database of around 400,000 crossword clues was used in the design of the system," Littman said. "This is the equivalent of memorizing 14 years of daily puzzles. Since many clues recur, this database was critical to Proverb’s success."
Besides storing facts, the 30 modules also go through the clue lists and evaluate the probability that items in their databases could be solutions to the clues. Candidate solutions are then re-evaluated by another Proverb component, "Merger," which balances out differences in the way various modules weigh probabilities.
At the end of the process, a Proverb component called "Solver" searches the output from "Merger" for the best solutions that also fit each puzzle’s grid arrangements.
Littman said he and Keim conceived of the idea for a crossword solving computer system during a plane flight conversation when both were returning from an August 1998 artificial intelligence conference in Madison, Wisc. They proceeded to organize a seminar course at Duke that met weekly last fall, with about 10 students contributing modules.
"I think humans have an advantage that our computer system doesn’t: this sense of absolute certainty that they know the right answer," he added. "The computer never has that level of certainty. It doesn't really understand English, or the way that the world works.
"All it can do is find statistical relationships between words. But as the Duke team showed, these relationships are sufficiently powerful for solving crosswords at a competitive level."
Read more about PROVERB [pdf]