?

Log in

 
 
21 January 2013 @ 02:40 pm
Too Big to Solve?  
Not my tagline, but a good description for the Mystery Hunt that just happened. One line of dialogue after last year's Hunt that I led with in my wrap-up was a question of when is too soon for a Hunt to end. I said, in this era of a few competitive teams trying to grow to get over the winning hurdle, constructors aiming bigger was a mistake. The Hunt ending after 36 hours (Midnight Saturday) is fine if that makes the solving experience stretch over the weekend for everyone else. I won't comment generally on this year's effort but it seems a great example to point back to of too much ambition by too many people towards the further militarization of the size of Hunt so that by 2025 the team "The whole of new USA" can go after the coin against "USSReunited" for at least a month. The sense of "puzzle" versus "grindy work" is also a discussion I have every year and I don't choose to repeat myself. I've felt since 2008 that the Mystery Hunt is far from an event I'd regularly attend in person although I'm glad to have finally been onsite to play with Team Luck with whom I've been a "free agent" now for three years.

I had a good solving year as things go relatively, but it was mostly demoralizing personally. I soloed Palmer's Portals, for example, but spent many hours after basically solving 8/10ths with a need to tweak a very small and underconstrained set of things to get from that hard work state to a finished state. At some stage I told the team "I'm going to solve Portals and the Feynman meta and then go sleep" and I met this goal but in many times the expected time when I gave the statement. I led the solve of both Danny Ocean (with zebraboy stating the most necessary last bit to get my work over the cliff) and Richard Feynman (with Jasters). I obviously co-solved lots of the logic puzzles and other puzzles, and gave various finishing help to a range of things too. I think I did this best for "Kid Crossword" once when he had spent a lot of timing mastering the hard steps of a crossword/scrabble puzzle -- and could quite impressively fast rewrite out the set of steps I wanted him to do about the puzzle -- and the follow-up steps were not obvious but I led the killing of the beast. This was too often the feel for these puzzles, and my assassination rate was far lower than I wanted. My Sunday was spent earning 3 puzzle answers by actually going to an event, and then falsely believing the power to buy some answers would let me finish solving the Indiana Jones mini-metas -- where I had already mostly soloed Adventure 2's snakes with 5/8 answers, but then killed myself dead on #1/Ouroboros for the rest of the day for so long solving, as many solvers will say in hindsight, the puzzle that was meant to be in one of a dozen ways and not the puzzle it was. Let me state here as I did for hours with my team, the phrase "I'm not cut out for this" is horrible flavor. It implies both cut this out and, in a different way, also don't cut this out. This makes you want to cut it out, which takes a lot of time, but also to not invest too much time in cutting it out, so as to save the wasted time of doing a task you are being told not to do. Other wordings are far safer, and implied negatives within positives is one of the five worst flavor failure modes in my opinion. Puzzle editing and flavor text is an art and is certainly the biggest variable from year to year and constructing team to constructing team.

So yeah, Mystery Hunt happened. And there were the usual share of overwhelmingly incredible Aha moments. Endgame seemed very fun and I wish all teams could do just that for the weekend or at least a lot more things like that. More of that, and more sleep, would have both been some good choices this year. If only the puzzles solved on schedule.

ETA: And as I added far below around comment #300, as a solver who was both frustrated yet had fun in this Hunt, I do want to thank everyone on Sages for the incredible effort they put in. Making a Mystery Hunt is a gift for all solvers whether it matches expectations or not, and as a mostly thankless job I do want the constructors and editors and software engineers and graphic designers and cooks and phone center workers and everyone else to know I appreciated all you did over the last weekend to give us several days together for puzzling.

Further, as I was asked to write a larger piece elsewhere that has given me personally a lot more attention as the face of the criticism, and as I use the phrase "My team" a lot in general as solving forms this kind of bond, I want to be very clear: since Bombers broke up after 2009 I have been a free agent. I have solved recently with Team Luck but am not a core part of their leadership and these opinions I state are my own. I intend to form my own team next year to go after the coin again, and if you have a problem with what I have said anywhere on the internets, please hate me for it. I believe in my posts I have been offering constructive criticism, but even what I have said is without all the facts of what went on inside Sages so I could easily be speaking from ignorance a lot of the time.

EFTA: Thanks to tablesaw for pointing out this chronologic feature of posts. If you want to see all the additions to this post in time sorted order, go here http://motris.livejournal.com/181790.html?view=flat. We're on page 14 at the moment.
 
 
 
Jenny G: Crazy Puzzlehahathor on January 23rd, 2013 02:14 am (UTC)
FWIW, I came in in the morning reasonably well rested, and when I saw the flavor text it seemed very clear to me that it meant you shouldn't cut the snakes out. As it did to pretty much everyone else on Palindrome.

I enjoyed the hunt, other than being forced to sniff & ingest such a large amount of horseradish and white pepper. As did about half a dozen others on the team - then we ended up buying the answer just so we wouldn't have to deal with the damned spices any more.
Derek KismanDerek Kisman on January 23rd, 2013 10:30 am (UTC)
Ok, so it wasn't just Thomas. Sorry about that; our testsolvers didn't think that way. Indeed, I think in general our Manic Sages testsolvers just turned out to be too damn good at solving Manic Sages puzzles. Perhaps we think too much alike.
motrismotris on January 23rd, 2013 02:07 pm (UTC)
I at least wrote, as in my original message above, that I recognized it could be cut out but was sloppy so we did cut out a few. But the first person set to cutting gave up on the task after 2 snakes. I gave up after cutting the next 4. It was taking too long to cut and not seeming to actually do anything.

Obviously, if I had a self-consistent notion of what the 1/2's were doing, or instructions to the full puzzle, I would do the thing this puzzle was. But instead I did the set of things this puzzle should -- nay must -- be in the "not cut out" world. Only when we had all 8 answers and were pushed by hints to PA and NJ did our team have any chance here.
(Anonymous) on January 24th, 2013 03:10 am (UTC)
I'm proud to say Luck actually had NC done before the PA and NJ hints came in. I don't think you were in the room when this happened, though. Our problem was assuming the shape of the remaining snake was supposed to give us the answer. We stared at that thing for over an hour, and Nathan finally asked for a hint.

Alison
owens888owens888 on January 24th, 2013 03:59 am (UTC)
We cut the snakes out, but since the other snakes' shapes were important, we thought the last snake's shape was important too. We kept wanting it to be Maine.
Andrewbrokenwndw on January 23rd, 2013 07:34 pm (UTC)
Not to be too pointed about it, but I am very curious: were you on a one pass test standard or a two pass test standard? We found that from time to time we would stumble on the one person in creation who happened to be able to solve a puzzle, and only on the second test pass did we figure out that the puzzle was not actually solvable in general.
Dave "Novalis" Turnernovalis on January 23rd, 2013 07:52 pm (UTC)
Actually, I wanted to ask about what your testing process was. I have written before about the importance of testing, and I'm curious whether testing was the problem with this hunt.

(I think someone else on our team was going to describe our process, but if that doesn't happen, I'll probably make a post about it).
Dr. C. Scott Ananiancananian on January 23rd, 2013 10:10 pm (UTC)
This was a lesson we learned the hard way (same as Sages) on the Matrix hunt, which I implored Codex to remember when we were writing the hunt: your test solvers are going to be above-average solvers of puzzles your team writes. ACME had a large number of linguists, and wrote a lot of linguistics puzzles that we breezed through and then had to suffer through watching teams fail to solve. Similarly, ACME wrote an I-Ching puzzle that every one of our test solvers immediately recognized as such -- very few other teams did.

Two factors here: one is that you tend to write puzzles that your team has expertise in; and second that you tend to select test solvers you think are "appropriate" for the puzzle (which solving teams can't always do). Codex made a strong effort to randomly select test solvers to protect against the latter effect.
Andrewbrokenwndw on January 23rd, 2013 10:40 pm (UTC)
...we did? :-)

(I do know I tried to give Emily puzzles I knew she would enjoy, but that's because she couldn't solve them all and I wanted to make sure she was having fun. I don't know how much targeting happened after that step, because beyond that it was her domain.)
Emilydumble on January 24th, 2013 08:40 am (UTC)
We did make sure that puzzles got to appropriate testsolvers eventually, but in a way intended to minimize the effect cscott is talking about.

In the case of puzzles where the required skill/knowledge was obvious (e.g. logic puzzles that were clearly logic puzzles), I would send them immediately to appropriate testsolvers. In the case of puzzles where the requirements weren't obvious (e.g. Course 7E, which was not obviously about Starcraft), it would go to random testsolvers. If they correctly identified the required subject knowledge, I would help them find more appropriate testsolvers to hand it off to.
AJDdr_whom on January 24th, 2013 12:40 am (UTC)
Certainly Plant didn't "tend to select" testsolvers it thought was appropriate. okosut or jcberk can correct me if I'm wrong here, but my understand is that the first step of testsolving was always random assignment to a solver, and then only if a randomly-assigned testsolver said 'looks like this is about linguistics, I don't know anything about linguistics,' say, would we deliberately target the puzzle at a linguist testsolver.
jcberk on January 24th, 2013 06:19 pm (UTC)
There may have been a few cases where we targeted immediately, but that was generally highly specialized puzzles where the specialty was obvious (e.g. http://web.mit.edu/puzzle/www/06/puzzles/washington/convocation/). We used random assignment in general, partly as a function of being a big team and wanting everyone to get a fair shot at newly testable puzzles. Even that didn't work perfectly when a few people were power testsolvers and saw just about everything, but in general we had good diversity of testers.
fclbrokle on January 24th, 2013 08:20 am (UTC)
Sages let solvers look through a few different puzzles and pick which one was appealing to them, to mimic how they would choose during hunt. The overall point --- that we would be better at solving our own puzzles, and that we needed to watch how long we took to solve --- appears to be the critical insight that we failed to get.
ze top blurberry: driftingztbb on January 24th, 2013 10:19 pm (UTC)
Dan, I think it runs deeper than that. I have a very hard time imagining your test-solvers reporting that they had fun identifying 263 clips for the dissonant song crossword, and similarly for many other puzzles that had this sort of unusually burdensome element. Or thinking that the multiple layers of aha! necessary to finish so many of the puzzles were a good idea. But if they actually did, then what Manic Sages test-solvers consider to be fun is significantly out of sync with what the majority of other solvers at the Hunt consider to be fun. It really is not just that Sages misjudged the number of puzzles per hour that a team would solve, they misjudged what people would consider a reasonable or fair puzzle, and what solvers would enjoy spending their time on; if Sages ever run the hunt again (assuming you stay together) then people aren't going to trust you until they hear some acknowledgment of that.

I think the reason you see so many angry people is that these choices run pretty well counter to the consensus that coalesced after the problems with Matrix and Time Bandits. We learned these lessons the hard way, and have been making serious attempts to pass the accumulated wisdom down from writing team to writing team so that these kinds of issues stay in the past. I still don't have much of a grip on how such a regression could have happened this year, especially since I've solved so many great puzzles written by people on Sages before, but it's more than just not putting a stopwatch on test-solving runs.
Dr. C. Scott Ananiancananian on January 24th, 2013 11:15 pm (UTC)
Hm, well. The test solvers *did* think the 263*2 lookup puzzle was fun (see elsewhere in the comments here). But I think there was only one test solving group for this puzzle (Sages, correct me if I'm wrong). Codex (and Metaphysical Plant?) insisted on two different test solving groups to pass the final completed puzzle, because we knew we had some outliers among the team (esp super-solvers!).

Some of the general issues might just be the test solvers' inexperience with the hunt: it takes a lot of puzzles before you start to know what a good puzzle "feels like". Some of it may have been social -- being too nice to puzzles written by your friends (Codex test solvers were some cranky bastards!). But in this particular case the single test solve group from Sages really did like the 263-ID puzzle.

(And as mentioned elsewhere in the thread, it's not a bad puzzle idea, just lacking in how it was edited and presented. There are lots of good puzzle editors on this thread, and we couple probably come up with a dozen different ways to improve it. Just one: if there was a small 25-or-so clue puzzle before the ginormous one, which worked the same way, then people would have an accessible way to figure out how the puzzle worked and test hypotheses before diving in the deep end with the rest of the clues -- although probably better would to have been just to reduce the size of the crossword, since the number of cells grows as n^2.)

Anyway, I don't personally feel that it's necessary to threaten to not "trust" the Sages until they enact some ritual mea culpa. In my experience teams learn from their mistakes and write better hunts the second time they win. (Don't let me down, Kappa Sig.)

But I share your frustration that a certain body of hunt lore appears not to be being successfully transmitted between teams from year to year. Perhaps a single hand-off meeting after wrapup is not sufficient? I feel that long threads like this one are not flame-fests so much as previous authors' earnest attempts to ensure that the lessons they've learned are not lost/forgotten/ignored.

I plan to write a blog post sometime soon discussing the various hunt "traditions"/lessons I think are important. Briefly: "two pass standard for test solving"; "Multiple teams are expected to find the coin"/"the hunt's not over when someone's won"/"hunt HQ will remain open after the coin is found"; "puzzles should be measured by how fun they are, not how mindblowingly clever"; "complete solutions will be posted before wrapup" (expect nothing to be done after the hunt; good solutions are also vital to the editing/fact check process); "complete software for the hunt will be posted after the hunt" (so we can build on each others' work); "events are orthogonal from puzzles" (they don't give "answers", they provide some other currency); "free answer currency" (a means to work around stuck puzzles; hope you never have to use); "a coin for every winning team member". Some nice innovations this year which I'd like to see continued: practice/partial runarounds available for all teams, so that you can see some of the cool things built for final runaround even if you don't reach it; centralized hint system activated once the final set of competitive teams is clear.

Codex also re-used MPP's "deterministic unlock" scheme, where it doesn't matter *which* puzzles you've unlocked, just how many/when, so that puzzles are always unlocked in the same order for all teams. This does seem to simplify testing a lot. I'm not willing to add this to my "you really must do this" list, but I'd encourage teams to think very hard before going back to a maze/grid-structured hunt (like Matrix and Time Bandits, among others) where you need to answer puzzle A in order to access puzzle B.
AJDdr_whom on January 24th, 2013 11:58 pm (UTC)
At Plant's post-Hunt brunch (which, this year, was not actually post-Hunt, but never mind) we talked about the possibility of putting up a Guide To Running The Hunt similar to what you describe, maybe somewhere on Plant webspace, but with contributions from all recent running teams. (Including arguments for and against some of the points you mention that there's not necessarily universal agreement on, like orthogonal events and multiple copies of the coin.)

I agree with you in favoring the "deterministic unlock" scheme over map-based unlocking, by the way, but it seems unfair to attribute map-based unlocking just to Matrix and Time Bandits when it was also used in well-regarded Hunts like Normalville and SPIES (...though it was just about the worst thing about SPIES, and its failure there is one of the reasons that I soured on it and that we opted for deterministic unlocking in Mario). And it's kind of you to attribute deterministic unlocking to Plant, but I think it originated with Hell Hunt in '07?
Dr. C. Scott Ananiancananian on January 25th, 2013 12:24 am (UTC)
I didn't mean to single out Matrix and Time Bandits, they were just the first two map-based hunts which leapt to mind. And Plant gave us a solid argument for deterministic unlock, which directly influenced Codex's hunt.

I agree that not all of my suggested points are uncontroversial. But there are solid arguments for doing things "that way", and hunts structured "that way" have turned out well. For new hunt-writing teams, I'd suggest that they be guided by best practices for their first hunt, and if they are going to break "the rules" at least choose a small number of rules to break their first time out. Innovate selectively.
(no subject) - redcat9 on January 25th, 2013 02:00 pm (UTC) (Expand)
Thouis R. JonesThouis R. Jones on January 25th, 2013 05:27 pm (UTC)
jcberk has some documents that could seed this effort (most from the Setec/Plant transition in 2006). All these comments should also probably be condensed into it.
(Anonymous) on January 25th, 2013 02:00 am (UTC)
"Codex (and Metaphysical Plant?) insisted on two different test solving groups to pass the final completed puzzle,"

jcberk and okosut will be able to answer in more detail, but I think the following should be roughly right. 2 separate test solves was meant to be the normal situation. I would guess that upwards of 75% of our puzzles met this standard. We (or at least I) were always open to a puzzle which was very challenging, but very elegant, getting only one testsolve, if there was no way to make it easier without destroying the elegance.

Towards the end, we started getting desperate to finish enough puzzles and admitted a few puzzles where the one solution was obtained in a ragged way, with one solver making partial progress and then handing that progress off to another, or allowing a correct solve that pointed out typos without resolving the typo corrected version. IIRC, Fascinating Kids, Execution Grounds, Recombination, Rocky Horror and Powder Monkey were of this form. These were some of our least popular puzzles; I would advise writing teams not to allow this unless they have no alternative.

If Codex managed to truly get 2 solves for every puzzle I am very impressed. They certainly produced very clean and elegant puzzles.

David S.
nameelectricshadow4 on January 25th, 2013 02:26 am (UTC)
The puzzles I remember definitely not getting 2 successful testsolves for 2012 were O Blessed Day and Undiscovered Underground. There were possibly a few others that didn't from time constraints.
(no subject) - brokenwndw on January 25th, 2013 10:06 am (UTC) (Expand)
jcberk on January 25th, 2013 05:53 am (UTC)
We definitely wanted two solves for everything but didn't quite achieve that, particularly as we hit late December. And 2011 was a little less fully vetted than 2006, as more people had jobs and we were somewhat less paranoid - and that was reflected in a few more issues. Both times the puzzles that didn't get two clean, unhinted test-solves were likely to be ones that didn't get solved or weren't widely liked.
Andrewbrokenwndw on January 25th, 2013 10:01 am (UTC)
electricshadow4 is correct above, at least as my hazy memory indicates. There were also a few puzzles, as time grew short, where we decided that the puzzle was so clearly easy and clean that we would skip the second test for time reasons. But we treated this a little like a nuclear launch; Emily and I discussed each case and both of us had to turn the key to do it.
ze top blurberry: driftingztbb on January 25th, 2013 02:23 am (UTC)
I don't mean it as a threat. I've spoken with multiple people who independently have gotten the sense that Sages still believe the issue was purely one of length, not editorial judgment about what makes a puzzle fun and solvable, and unless that changes would plan to stay away next time. From what I've heard and read, I think it's reasonable to be concerned that the right message hasn't gotten through yet; so here I am, I guess, being a role model for being cranky about puzzles written by your friends. And to be clear, since I haven't been so positive, I think there was a ton of really beautiful, ingenious stuff in this hunt, and it's just unfortunate that the editing didn't allow it to shine through.

A few things not on your list of best practices, mostly related to editing:
-- to quote noahspuzzlelj, "At most one aha per puzzle. Zero is OK."
-- if a puzzle (or clue) almost works, it's broken
-- test-solve puzzles in final form (with the final layout) whenever possible.
(no subject) - cananian on January 25th, 2013 03:40 am (UTC) (Expand)
(no subject) - noahspuzzlelj on January 25th, 2013 05:59 pm (UTC) (Expand)
noahspuzzlelj on January 25th, 2013 04:07 am (UTC)
On the trust factor, it's worth noting that in 2006 a number of people didn't show up to solve our hunt because 2004 not only made people not trust Kappa Sig, but in fact made them not trust all young teams.
Adam R. Wood: bangzotmeister on January 25th, 2013 04:27 am (UTC)
"Anyway, I don't personally feel that it's necessary to threaten to not 'trust' the Sages until they enact some ritual mea culpa."

??!

I had nothing to do with this branch of conversation and yet I still take exception to this. If you can't even respect someone else using the WORD 'trust', how are we to feel any other way? Non-mutual trust is ill-placed trust; if you cannot comprehend doubt as being anything other than dishonest manipulation...