Log in

No account? Create an account
21 January 2013 @ 02:40 pm
Too Big to Solve?  
Not my tagline, but a good description for the Mystery Hunt that just happened. One line of dialogue after last year's Hunt that I led with in my wrap-up was a question of when is too soon for a Hunt to end. I said, in this era of a few competitive teams trying to grow to get over the winning hurdle, constructors aiming bigger was a mistake. The Hunt ending after 36 hours (Midnight Saturday) is fine if that makes the solving experience stretch over the weekend for everyone else. I won't comment generally on this year's effort but it seems a great example to point back to of too much ambition by too many people towards the further militarization of the size of Hunt so that by 2025 the team "The whole of new USA" can go after the coin against "USSReunited" for at least a month. The sense of "puzzle" versus "grindy work" is also a discussion I have every year and I don't choose to repeat myself. I've felt since 2008 that the Mystery Hunt is far from an event I'd regularly attend in person although I'm glad to have finally been onsite to play with Team Luck with whom I've been a "free agent" now for three years.

I had a good solving year as things go relatively, but it was mostly demoralizing personally. I soloed Palmer's Portals, for example, but spent many hours after basically solving 8/10ths with a need to tweak a very small and underconstrained set of things to get from that hard work state to a finished state. At some stage I told the team "I'm going to solve Portals and the Feynman meta and then go sleep" and I met this goal but in many times the expected time when I gave the statement. I led the solve of both Danny Ocean (with zebraboy stating the most necessary last bit to get my work over the cliff) and Richard Feynman (with Jasters). I obviously co-solved lots of the logic puzzles and other puzzles, and gave various finishing help to a range of things too. I think I did this best for "Kid Crossword" once when he had spent a lot of timing mastering the hard steps of a crossword/scrabble puzzle -- and could quite impressively fast rewrite out the set of steps I wanted him to do about the puzzle -- and the follow-up steps were not obvious but I led the killing of the beast. This was too often the feel for these puzzles, and my assassination rate was far lower than I wanted. My Sunday was spent earning 3 puzzle answers by actually going to an event, and then falsely believing the power to buy some answers would let me finish solving the Indiana Jones mini-metas -- where I had already mostly soloed Adventure 2's snakes with 5/8 answers, but then killed myself dead on #1/Ouroboros for the rest of the day for so long solving, as many solvers will say in hindsight, the puzzle that was meant to be in one of a dozen ways and not the puzzle it was. Let me state here as I did for hours with my team, the phrase "I'm not cut out for this" is horrible flavor. It implies both cut this out and, in a different way, also don't cut this out. This makes you want to cut it out, which takes a lot of time, but also to not invest too much time in cutting it out, so as to save the wasted time of doing a task you are being told not to do. Other wordings are far safer, and implied negatives within positives is one of the five worst flavor failure modes in my opinion. Puzzle editing and flavor text is an art and is certainly the biggest variable from year to year and constructing team to constructing team.

So yeah, Mystery Hunt happened. And there were the usual share of overwhelmingly incredible Aha moments. Endgame seemed very fun and I wish all teams could do just that for the weekend or at least a lot more things like that. More of that, and more sleep, would have both been some good choices this year. If only the puzzles solved on schedule.

ETA: And as I added far below around comment #300, as a solver who was both frustrated yet had fun in this Hunt, I do want to thank everyone on Sages for the incredible effort they put in. Making a Mystery Hunt is a gift for all solvers whether it matches expectations or not, and as a mostly thankless job I do want the constructors and editors and software engineers and graphic designers and cooks and phone center workers and everyone else to know I appreciated all you did over the last weekend to give us several days together for puzzling.

Further, as I was asked to write a larger piece elsewhere that has given me personally a lot more attention as the face of the criticism, and as I use the phrase "My team" a lot in general as solving forms this kind of bond, I want to be very clear: since Bombers broke up after 2009 I have been a free agent. I have solved recently with Team Luck but am not a core part of their leadership and these opinions I state are my own. I intend to form my own team next year to go after the coin again, and if you have a problem with what I have said anywhere on the internets, please hate me for it. I believe in my posts I have been offering constructive criticism, but even what I have said is without all the facts of what went on inside Sages so I could easily be speaking from ignorance a lot of the time.

EFTA: Thanks to tablesaw for pointing out this chronologic feature of posts. If you want to see all the additions to this post in time sorted order, go here http://motris.livejournal.com/181790.html?view=flat. We're on page 14 at the moment.
ze top blurberry: driftingztbb on January 24th, 2013 10:19 pm (UTC)
Dan, I think it runs deeper than that. I have a very hard time imagining your test-solvers reporting that they had fun identifying 263 clips for the dissonant song crossword, and similarly for many other puzzles that had this sort of unusually burdensome element. Or thinking that the multiple layers of aha! necessary to finish so many of the puzzles were a good idea. But if they actually did, then what Manic Sages test-solvers consider to be fun is significantly out of sync with what the majority of other solvers at the Hunt consider to be fun. It really is not just that Sages misjudged the number of puzzles per hour that a team would solve, they misjudged what people would consider a reasonable or fair puzzle, and what solvers would enjoy spending their time on; if Sages ever run the hunt again (assuming you stay together) then people aren't going to trust you until they hear some acknowledgment of that.

I think the reason you see so many angry people is that these choices run pretty well counter to the consensus that coalesced after the problems with Matrix and Time Bandits. We learned these lessons the hard way, and have been making serious attempts to pass the accumulated wisdom down from writing team to writing team so that these kinds of issues stay in the past. I still don't have much of a grip on how such a regression could have happened this year, especially since I've solved so many great puzzles written by people on Sages before, but it's more than just not putting a stopwatch on test-solving runs.
Dr. C. Scott Ananiancananian on January 24th, 2013 11:15 pm (UTC)
Hm, well. The test solvers *did* think the 263*2 lookup puzzle was fun (see elsewhere in the comments here). But I think there was only one test solving group for this puzzle (Sages, correct me if I'm wrong). Codex (and Metaphysical Plant?) insisted on two different test solving groups to pass the final completed puzzle, because we knew we had some outliers among the team (esp super-solvers!).

Some of the general issues might just be the test solvers' inexperience with the hunt: it takes a lot of puzzles before you start to know what a good puzzle "feels like". Some of it may have been social -- being too nice to puzzles written by your friends (Codex test solvers were some cranky bastards!). But in this particular case the single test solve group from Sages really did like the 263-ID puzzle.

(And as mentioned elsewhere in the thread, it's not a bad puzzle idea, just lacking in how it was edited and presented. There are lots of good puzzle editors on this thread, and we couple probably come up with a dozen different ways to improve it. Just one: if there was a small 25-or-so clue puzzle before the ginormous one, which worked the same way, then people would have an accessible way to figure out how the puzzle worked and test hypotheses before diving in the deep end with the rest of the clues -- although probably better would to have been just to reduce the size of the crossword, since the number of cells grows as n^2.)

Anyway, I don't personally feel that it's necessary to threaten to not "trust" the Sages until they enact some ritual mea culpa. In my experience teams learn from their mistakes and write better hunts the second time they win. (Don't let me down, Kappa Sig.)

But I share your frustration that a certain body of hunt lore appears not to be being successfully transmitted between teams from year to year. Perhaps a single hand-off meeting after wrapup is not sufficient? I feel that long threads like this one are not flame-fests so much as previous authors' earnest attempts to ensure that the lessons they've learned are not lost/forgotten/ignored.

I plan to write a blog post sometime soon discussing the various hunt "traditions"/lessons I think are important. Briefly: "two pass standard for test solving"; "Multiple teams are expected to find the coin"/"the hunt's not over when someone's won"/"hunt HQ will remain open after the coin is found"; "puzzles should be measured by how fun they are, not how mindblowingly clever"; "complete solutions will be posted before wrapup" (expect nothing to be done after the hunt; good solutions are also vital to the editing/fact check process); "complete software for the hunt will be posted after the hunt" (so we can build on each others' work); "events are orthogonal from puzzles" (they don't give "answers", they provide some other currency); "free answer currency" (a means to work around stuck puzzles; hope you never have to use); "a coin for every winning team member". Some nice innovations this year which I'd like to see continued: practice/partial runarounds available for all teams, so that you can see some of the cool things built for final runaround even if you don't reach it; centralized hint system activated once the final set of competitive teams is clear.

Codex also re-used MPP's "deterministic unlock" scheme, where it doesn't matter *which* puzzles you've unlocked, just how many/when, so that puzzles are always unlocked in the same order for all teams. This does seem to simplify testing a lot. I'm not willing to add this to my "you really must do this" list, but I'd encourage teams to think very hard before going back to a maze/grid-structured hunt (like Matrix and Time Bandits, among others) where you need to answer puzzle A in order to access puzzle B.
AJDdr_whom on January 24th, 2013 11:58 pm (UTC)
At Plant's post-Hunt brunch (which, this year, was not actually post-Hunt, but never mind) we talked about the possibility of putting up a Guide To Running The Hunt similar to what you describe, maybe somewhere on Plant webspace, but with contributions from all recent running teams. (Including arguments for and against some of the points you mention that there's not necessarily universal agreement on, like orthogonal events and multiple copies of the coin.)

I agree with you in favoring the "deterministic unlock" scheme over map-based unlocking, by the way, but it seems unfair to attribute map-based unlocking just to Matrix and Time Bandits when it was also used in well-regarded Hunts like Normalville and SPIES (...though it was just about the worst thing about SPIES, and its failure there is one of the reasons that I soured on it and that we opted for deterministic unlocking in Mario). And it's kind of you to attribute deterministic unlocking to Plant, but I think it originated with Hell Hunt in '07?
Dr. C. Scott Ananiancananian on January 25th, 2013 12:24 am (UTC)
I didn't mean to single out Matrix and Time Bandits, they were just the first two map-based hunts which leapt to mind. And Plant gave us a solid argument for deterministic unlock, which directly influenced Codex's hunt.

I agree that not all of my suggested points are uncontroversial. But there are solid arguments for doing things "that way", and hunts structured "that way" have turned out well. For new hunt-writing teams, I'd suggest that they be guided by best practices for their first hunt, and if they are going to break "the rules" at least choose a small number of rules to break their first time out. Innovate selectively.
Catredcat9 on January 25th, 2013 02:00 pm (UTC)
I think sharing the "why" of something is extremely important and the most likely to get lost over time. For example, we had a lot of reasons for making events orthogonal to puzzles in Mario (all having to do with trying to make it easier for teams to choose how to enjoy Hunt), but the opinions were mixed, and there could well be better solutions to the same underlying issues. It's too easy to see one solution, decide it's bad, and not go back to the reasons it was implemented in the first place.

That said, I suggested a collective archive of "stuff we learned or did wrong" because of one very stupid thing we learned during SPIES--DO NOT have your "bring us food" puzzle extremely late in Hunt. You will end up with no food, except at exactly the moment you want to be cleaning up and sleeping. That's the kind of thing that no one's going to think to pass on, but no one should have to learn twice.
Thouis R. JonesThouis R. Jones on January 25th, 2013 05:27 pm (UTC)
jcberk has some documents that could seed this effort (most from the Setec/Plant transition in 2006). All these comments should also probably be condensed into it.
(Anonymous) on January 25th, 2013 02:00 am (UTC)
"Codex (and Metaphysical Plant?) insisted on two different test solving groups to pass the final completed puzzle,"

jcberk and okosut will be able to answer in more detail, but I think the following should be roughly right. 2 separate test solves was meant to be the normal situation. I would guess that upwards of 75% of our puzzles met this standard. We (or at least I) were always open to a puzzle which was very challenging, but very elegant, getting only one testsolve, if there was no way to make it easier without destroying the elegance.

Towards the end, we started getting desperate to finish enough puzzles and admitted a few puzzles where the one solution was obtained in a ragged way, with one solver making partial progress and then handing that progress off to another, or allowing a correct solve that pointed out typos without resolving the typo corrected version. IIRC, Fascinating Kids, Execution Grounds, Recombination, Rocky Horror and Powder Monkey were of this form. These were some of our least popular puzzles; I would advise writing teams not to allow this unless they have no alternative.

If Codex managed to truly get 2 solves for every puzzle I am very impressed. They certainly produced very clean and elegant puzzles.

David S.
nameelectricshadow4 on January 25th, 2013 02:26 am (UTC)
The puzzles I remember definitely not getting 2 successful testsolves for 2012 were O Blessed Day and Undiscovered Underground. There were possibly a few others that didn't from time constraints.
(no subject) - brokenwndw on January 25th, 2013 10:06 am (UTC) (Expand)
jcberk on January 25th, 2013 05:53 am (UTC)
We definitely wanted two solves for everything but didn't quite achieve that, particularly as we hit late December. And 2011 was a little less fully vetted than 2006, as more people had jobs and we were somewhat less paranoid - and that was reflected in a few more issues. Both times the puzzles that didn't get two clean, unhinted test-solves were likely to be ones that didn't get solved or weren't widely liked.
Andrewbrokenwndw on January 25th, 2013 10:01 am (UTC)
electricshadow4 is correct above, at least as my hazy memory indicates. There were also a few puzzles, as time grew short, where we decided that the puzzle was so clearly easy and clean that we would skip the second test for time reasons. But we treated this a little like a nuclear launch; Emily and I discussed each case and both of us had to turn the key to do it.
ze top blurberry: driftingztbb on January 25th, 2013 02:23 am (UTC)
I don't mean it as a threat. I've spoken with multiple people who independently have gotten the sense that Sages still believe the issue was purely one of length, not editorial judgment about what makes a puzzle fun and solvable, and unless that changes would plan to stay away next time. From what I've heard and read, I think it's reasonable to be concerned that the right message hasn't gotten through yet; so here I am, I guess, being a role model for being cranky about puzzles written by your friends. And to be clear, since I haven't been so positive, I think there was a ton of really beautiful, ingenious stuff in this hunt, and it's just unfortunate that the editing didn't allow it to shine through.

A few things not on your list of best practices, mostly related to editing:
-- to quote noahspuzzlelj, "At most one aha per puzzle. Zero is OK."
-- if a puzzle (or clue) almost works, it's broken
-- test-solve puzzles in final form (with the final layout) whenever possible.
Dr. C. Scott Ananiancananian on January 25th, 2013 03:40 am (UTC)
I totally agree with your additions. I'd stretch the aha limit to two at most, though -- only if both ahas are beautiful. Ie, [contrived example] (1) "it's not a fish puzzle, it's a scooby-doo puzzle!", [some scooby-doo identification follows] (2) "what do fish and scooby-doo have in common? goldfish! it's really a puzzle about snack foods!"

Note that the second aha in this case is not, "now how am I going to get an answer out of all these scooby-doo episodes?". Second ahas are only legit if they are beautiful. Extraction rarely qualifies as beautiful.

I feel like I should also add, "answers will be recognizable" to the list of best practices. Usually that's not an issue, but this hunt really stretched the boundary between "clue phrase" and "answer" to the breaking point.
noahspuzzlelj on January 25th, 2013 05:59 pm (UTC)
I'm pretty sure I was paraphrasing from the advice Setec gave us in 2006.
noahspuzzlelj on January 25th, 2013 04:07 am (UTC)
On the trust factor, it's worth noting that in 2006 a number of people didn't show up to solve our hunt because 2004 not only made people not trust Kappa Sig, but in fact made them not trust all young teams.
Adam R. Wood: bangzotmeister on January 25th, 2013 04:27 am (UTC)
"Anyway, I don't personally feel that it's necessary to threaten to not 'trust' the Sages until they enact some ritual mea culpa."


I had nothing to do with this branch of conversation and yet I still take exception to this. If you can't even respect someone else using the WORD 'trust', how are we to feel any other way? Non-mutual trust is ill-placed trust; if you cannot comprehend doubt as being anything other than dishonest manipulation...