?

Log in

 
 
21 January 2013 @ 02:40 pm
Too Big to Solve?  
Not my tagline, but a good description for the Mystery Hunt that just happened. One line of dialogue after last year's Hunt that I led with in my wrap-up was a question of when is too soon for a Hunt to end. I said, in this era of a few competitive teams trying to grow to get over the winning hurdle, constructors aiming bigger was a mistake. The Hunt ending after 36 hours (Midnight Saturday) is fine if that makes the solving experience stretch over the weekend for everyone else. I won't comment generally on this year's effort but it seems a great example to point back to of too much ambition by too many people towards the further militarization of the size of Hunt so that by 2025 the team "The whole of new USA" can go after the coin against "USSReunited" for at least a month. The sense of "puzzle" versus "grindy work" is also a discussion I have every year and I don't choose to repeat myself. I've felt since 2008 that the Mystery Hunt is far from an event I'd regularly attend in person although I'm glad to have finally been onsite to play with Team Luck with whom I've been a "free agent" now for three years.

I had a good solving year as things go relatively, but it was mostly demoralizing personally. I soloed Palmer's Portals, for example, but spent many hours after basically solving 8/10ths with a need to tweak a very small and underconstrained set of things to get from that hard work state to a finished state. At some stage I told the team "I'm going to solve Portals and the Feynman meta and then go sleep" and I met this goal but in many times the expected time when I gave the statement. I led the solve of both Danny Ocean (with zebraboy stating the most necessary last bit to get my work over the cliff) and Richard Feynman (with Jasters). I obviously co-solved lots of the logic puzzles and other puzzles, and gave various finishing help to a range of things too. I think I did this best for "Kid Crossword" once when he had spent a lot of timing mastering the hard steps of a crossword/scrabble puzzle -- and could quite impressively fast rewrite out the set of steps I wanted him to do about the puzzle -- and the follow-up steps were not obvious but I led the killing of the beast. This was too often the feel for these puzzles, and my assassination rate was far lower than I wanted. My Sunday was spent earning 3 puzzle answers by actually going to an event, and then falsely believing the power to buy some answers would let me finish solving the Indiana Jones mini-metas -- where I had already mostly soloed Adventure 2's snakes with 5/8 answers, but then killed myself dead on #1/Ouroboros for the rest of the day for so long solving, as many solvers will say in hindsight, the puzzle that was meant to be in one of a dozen ways and not the puzzle it was. Let me state here as I did for hours with my team, the phrase "I'm not cut out for this" is horrible flavor. It implies both cut this out and, in a different way, also don't cut this out. This makes you want to cut it out, which takes a lot of time, but also to not invest too much time in cutting it out, so as to save the wasted time of doing a task you are being told not to do. Other wordings are far safer, and implied negatives within positives is one of the five worst flavor failure modes in my opinion. Puzzle editing and flavor text is an art and is certainly the biggest variable from year to year and constructing team to constructing team.

So yeah, Mystery Hunt happened. And there were the usual share of overwhelmingly incredible Aha moments. Endgame seemed very fun and I wish all teams could do just that for the weekend or at least a lot more things like that. More of that, and more sleep, would have both been some good choices this year. If only the puzzles solved on schedule.

ETA: And as I added far below around comment #300, as a solver who was both frustrated yet had fun in this Hunt, I do want to thank everyone on Sages for the incredible effort they put in. Making a Mystery Hunt is a gift for all solvers whether it matches expectations or not, and as a mostly thankless job I do want the constructors and editors and software engineers and graphic designers and cooks and phone center workers and everyone else to know I appreciated all you did over the last weekend to give us several days together for puzzling.

Further, as I was asked to write a larger piece elsewhere that has given me personally a lot more attention as the face of the criticism, and as I use the phrase "My team" a lot in general as solving forms this kind of bond, I want to be very clear: since Bombers broke up after 2009 I have been a free agent. I have solved recently with Team Luck but am not a core part of their leadership and these opinions I state are my own. I intend to form my own team next year to go after the coin again, and if you have a problem with what I have said anywhere on the internets, please hate me for it. I believe in my posts I have been offering constructive criticism, but even what I have said is without all the facts of what went on inside Sages so I could easily be speaking from ignorance a lot of the time.

EFTA: Thanks to tablesaw for pointing out this chronologic feature of posts. If you want to see all the additions to this post in time sorted order, go here http://motris.livejournal.com/181790.html?view=flat. We're on page 14 at the moment.
 
 
 
 Catherinecmouse on January 23rd, 2013 11:31 pm (UTC)
I mean this is a huge question and believe me I have more models than you would believe on hunt length. This is an unbelievably complex question.

We (SagesHQ) all built models in deciding hunt length and I actually built solving models for every top or middle team. When we finish compiling and releasing the data you'll see that Death and Mayhem (Death from Above and Electric Mayhem) ended up nearly 200 people strong and super competitive. You can see the change in solving rates and projected solving rates from either team alone.

Puzzle solving rates have been increasing in the past few years and hunts have been shortening. Teams are getting bigger and *technology* is improving in a startlingly nonlinear rate. I know a lot of people wish that hunt wouldn't end in the early hours of Sunday (we targeted 3pm on Sunday). That's a big gap between when 2012's hunt ended and our target hunt time.
lunchboylunchboy on January 24th, 2013 12:09 am (UTC)
A 200-person team solved more puzzles than other teams and was in competition for finding the coin? This isn't altogether surprising. When you state: "Puzzle solving rates have been increasing in the past few years and hunts have been shortening. Teams are getting bigger and *technology* is improving in a startlingly nonlinear rate," you're buying into an assumption that the optimal team size is 150 or more people. Teams got bigger because they wanted a better chance of winning, and a bigger team meant (possibly) solving faster. If a large team has an inbuilt solving edge, I don't see that it follows that the Hunt has to escalate its length and difficulty to cater to the teams that increased their size. This screws smaller teams over two ways: they have less chance to win in the first place because they're up against megateams, and then future Hunts are not even approachable for them because they're so overwhelmed with puzzles.

(Also, the 2012 Hunt *did* end on Sunday afternoon. We did not stop running it just because Sages found the coin.)
Andrewbrokenwndw on January 24th, 2013 12:35 am (UTC)
What are you talking about? Our hunt ended at 3 PM on Sunday, just like the 2011 hunt!

...and to take my tongue slightly out of my cheek, what I mean here is that I really liked the transition in attitude, in 2011-2012, from "the hunt 'ends' when the coin is found" to "the hunt 'ends' when HQ closes, well after the coin is found". I want lots of teams to solve 90% of the puzzles and see endgame. I want even small teams to have a fighting chance at solving a full-strength round. And none of these things are possible if you assume the hunt has to last all weekend for the winners.

I would be curious as to how your models worked. I am slightly ashamed to admit that our models consisted of "well, our puzzles are similar in difficulty and cleanliness to the 2011 puzzles, and there are about 0.9x as many, so it should be 0.9x as long." But it did work! :-)
Andrewbrokenwndw on January 24th, 2013 12:43 am (UTC)
Actually, I should correct myself and note that affpuzz did some pretty spiffy math when we were designing our unlock system. But during meta design our "model" consisted of what I described above.
Dr. C. Scott Ananiancananian on January 24th, 2013 04:21 am (UTC)
Incidentally Metaphysical Plant did a *much* better job that either Codex or Sages at modeling their hunt, and I believe their models turned out spot-on (or at least as close as possible given the ineffable nature of hunting). They recorded solution times during test solving for every puzzle and used these to drive their model. I wanted to do the same for Codex (Plant gave us a detailed description of their model), but we couldn't actually afford the bureaucratic overhead needed to record person-hours for every test solve session. So Alan came up with a different model based on 'typical' solve rates.

But anyway: [Atlas Shrugged], go talk to Metaphysical Plant. They were on to something.
Oliverokosut on January 24th, 2013 08:54 pm (UTC)
I think you're giving us a little too much credit. We certainly did try to record how much time people had spent testsolving. You can never know for sure, because someone could look at a puzzle on Monday, not think about it again until Friday, and then suddenly have the right idea. Were they working on it in the intermediate time? Not to mention the fact that many of our solves were done by the same person, and without being sleep deprived, and so on.

Anyway, Andrew took all that data and tried to parse it in a way that could be modelled to predict how hunt would go (this was a huge amount of work, as I understand it). The model was pretty far off, actually; as I recall it predicted hunt ending Saturday afternoon (actual end was 6am Sunday). We were pretty sure this would be wrong, partly for the above reasons and partly because the model didn't include people sleeping (not enough data, I think). Still, it was useful for gauging the approximate shape of when puzzles would be unlocked, and getting the right order of magnitude for hunt length.

Point is, predicting how long hunt will last is really, really hard.
Dr. C. Scott Ananiancananian on January 24th, 2013 09:07 pm (UTC)
In my book, aiming for a Saturday afternoon end with such a model is exactly the right thing. All the error sources were all in the "too short" direction. Once you hand-wave to account for sleep deprivation, solver inefficiencies, and the fact that your team is always better at your own puzzles, "Saturday afternoon" works out to a "sometime Sunday morning" end. Which nailed it.

My point is, if you're working with a model like this you can't be certain when the hunt ends (of course) but you might be less likely to make order-of-magnitude mistakes.
noahspuzzlelj on January 25th, 2013 03:59 am (UTC)
Another fun source of error is that your best testsolvers can solve more puzzles during testsolving (which runs over several months) than they can during hunt when their time is limited.
zandperl: Huntzandperl on January 24th, 2013 01:43 am (UTC)
How do you define "middle team"?

FWIW the Hunt did end at 3pm Sunday for Grand Unified Theory of Love: one of our team goals is to have a spotless room after the Hunt, so to make sure everyone chips in on the cleanup we always start packing up at 3pm Sunday. A few individuals continued to work Sunday night and Monday morning, but it was not as a team.
 Catherinecmouse on January 24th, 2013 02:19 am (UTC)
Yay! Thank you for cleaning up.