[00:00:29] of course, logging patches work better if jenkins actually merges them [00:10:46] what do you think jenkins is, your butler!? [00:16:26] bd808: things look calm so I'll head home [00:16:34] back online within an hour or so [00:16:43] sounds good tgr [00:17:00] thanks to jumping on all the little things I threw at you today [00:18:36] hello, anyone around that can help me with the new bot login system? [00:18:38] #wikimedia-tech [00:20:21] thanks ori [00:36:51] anomie: MusikAnimal is having problems getting logged in with a new bot password. He left before we debugged very far. [00:46:06] tgr|away: keep an eye out for MusikAnimal in #wikimedia-tech when you're online later [01:19:36] bd808: are we not logging authmanager events into logstash? I'm trying to find events from the authmanager channel, and getting nothing.. [01:20:08] * bd808 checks config [01:20:43] It's not in wmgMonologChannels.. I'm not sure if that means it goes in automatically or not [01:21:03] that means we don't log it anywhere [01:22:05] csteipp: I think you want CentralAuth, not authmanager [01:22:41] I'm trying to see events from: LoggerFactory::getInstance( 'authmanager' )->info( 'Account creation attempt', array( 'event' => 'accountcreation', 'status' => $status, ) ); [01:23:06] hmmm... yeah that's not logged [01:23:19] we can add it easily [01:24:16] Yeah, I might do that. Just add the channel to InitializeSettings, right? [01:24:24] that looks like it would be a pretty loud channel [01:25:44] Do I need to work with someone in ops to make sure we can handle it? [01:26:14] we can always shut it off it is too nasty [01:26:20] Or, I can just instrument what I'm looking for with getStats()->increment... but that seemed like a waste [01:26:46] Cool. I might ping you tomorrow when I'm able to work on this. [01:27:11] sounds good. it's an easy config change [01:34:10] bd808: that goes into https://grafana.wikimedia.org/dashboard/db/authentication-metrics [01:34:35] oh! right! [01:35:57] on that topic, API login failures are going up [01:36:19] not crazy high, so could be a single bot [01:36:44] well, we know Huggle3 is busted [01:37:05] and we had some other broken bots before right? [01:37:42] oh yeah, have to wait for all huggle users to update, right? [01:38:00] I tend to forget not all applications are centrally hosted [01:39:13] I don't think there's even a working build yet [01:39:59] at a glance mediawiki::gateway looks reasonable [01:40:14] and it doesn't look like it would store cookies permanently [01:40:59] you can't use a bot password with pywikibot because it tries to validate the username before using it [01:41:20] but I guess pywikibot also has all the kit needed for OAuth [01:43:23] yeah, pwb worked fine with the new owner-only oauth thing last time I tried [01:44:09] it's a bit awkward to configure because usernames are keyed by wiki names but OAuth tokens are keyed by domains [01:44:39] and cookies are stored in a separate file which is easier to lock down but OAuth tokens are in the config file [01:44:55] so it has its raw edges but usable [01:53:36] I'm off for a while tgr. ping me or send a text if something comes up that you want help with [01:53:53] ack [06:49:07] hi TimStarling [06:50:28] was a decision ever reached about whether we're going to do the mass conversion to [arrays, like, this]? I have the phpcs patch for it ready (https://gerrit.wikimedia.org/r/#/c/269612/), but there was some disagreement on https://gerrit.wikimedia.org/r/#/c/269745/ [07:51:51] no, there wasn't a decision [07:55:59] do you think I should start a discussion about it? or can we just do it...? [07:58:54] legoktm: just. do. it. [07:59:05] it's array syntax!!!111oneoneelevenone [08:02:31] someone needs to merge my phpcs change then! :) [08:07:06] which one? [08:07:23] i'll look at it tomorrow morning if it hasn't been merged by then [08:15:25] ori: https://gerrit.wikimedia.org/r/#/c/269612/ [11:32:02] notice of a bot that get WrongToken error since wmf.13 https://phabricator.wikimedia.org/T126724 [15:00:05] anomie: good morning! how is session manager behaving ? :-} [15:01:03] hashar: No problem reports so far besides a couple of programs with broken token handling that got confused by the inclusion of "+\" in the login token now. Which probably means it's time to roll it back to be safe ;) [15:06:22] anomie: to rephrase: the login token got changed and bots have to adjust isn't it ? [15:06:40] if it is only a few bots, I am wondering whether it is worth rollbacking the whole session manager [15:06:56] hashar: Bots that were just shoving the token into the post data without urlencoding it will have to start doing the correct thing. [15:07:01] * anomie was joking about rolling back [15:07:12] * hashar had an heart attack [15:07:26] would have been a pity to rollback once again for "just a few bots" :-} [15:07:43] ostriches told us the .13 deploy has been pretty much uneventfull [15:07:59] the various CI madness for the last two days probably caused headaches for deploy / fix etc though :-(( [15:08:09] it is all fixed now (we have doubled the number of slaves) [15:15:01] bd808, tgr: I see a spike in the NeedToken graph today. ZKBot this time, looks like another pywikibot. Notified the operator at https://www.wikidata.org/wiki/User_talk:Zaher.Kadour#ZKBot_appears_to_be_malfunctioning [15:55:47] anomie: ouch. is that whole bump one bot? [15:55:54] bd808: Yeah. [15:56:14] tight loops with no abort condition FTW! [15:56:52] My huggle patch got merged. I haven't looked to see if there's a new build released yet or not [15:56:58] It's probably the same pywikibot code that was Rezabot last time. [16:02:40] labs under maintenance. CI jobs might not trigger anymore due to lack of instances to run them in. [16:02:57] anomie: now STiki is broken too -- https://en.wikipedia.org/wiki/Wikipedia_talk:STiki#Login_problem [16:03:08] so many tools that need love [16:04:38] I saw that. Tried to compile it from source to see about it, but it threw a bunch of compile errors. [16:05:05] Huggle 3.1.19 is out with the fix -- https://github.com/huggle/huggle3-qt-lx/releases/tag/3.1.19 [16:05:23] Shitty XML bugs [16:05:46] I'm being poked to update/rewrite the AWB login code [16:05:52] I wonder if I should just shift it to JSON at the same time [16:06:41] yes, of course you should :) [16:08:22] The number of things that halt on warnings is a bit crazy too [16:17:55] STiki is encoding the token, so that's not its problem -- https://github.com/westand/STiki/blob/a27fafe67b3101287c03397e1f5db2877d60bed9/mediawiki_api/api_post.java#L113 [16:18:37] bd808: The "MediaWiki.site.users.rate" statistic is still lower than most previous weeks , although it's oddly a close match for Christmas and New Year's weeks. https://grafana.wikimedia.org/dashboard/db/authentication-metrics doesn't seem to show any noticeable drop in account creations. [16:20:09] this curve fitting from tgr still looks good, but the 42 day offset is the same that you are mentioning -- https://graphite.wikimedia.org/render/?width=1000&height=600&from=-5days&target=timeShift%28MediaWiki.site.users.rate%2C%2242d%22%29&target=MediaWiki.site.users.rate [16:21:46] so it's above, below or right on depending on which past week you match with [16:22:02] the graph is above the first rollout week [16:30:58] anomie: STiki had this fix on 2016-01-22 -- https://github.com/westand/STiki/commit/1ea869221ad6a207a041b79befe2e6e96c00dc63 [16:31:13] I wonder if that build works and people just haven't updated? [16:31:21] * bd808 can't get it to build from source either [16:39:49] I think the newest version of STiki works [16:40:10] I got to the point where it told me I didn't have enough on wiki rights to use it at least [16:41:49] I think it checks the rights before actually logging in. [16:43:18] boo [16:59:10] bd808: ... I have a guess as to what's going on. It's iterating depth-first through the XML result, so it sees the api→warnings→login tag thinking it's api→login, then https://github.com/westand/STiki/blob/master/mediawiki_api/api_xml_login.java#L50 sets result = null so the next line blows up trying to call .toUpperCase() on a null. [16:59:36] The author just posted on the talk page that he has a fix [17:00:07] https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:STiki&diff=704629856&oldid=704629090 [17:01:25] * anomie disassembles the java for the new version [17:01:41] Yeah, looks like he added "result != null" checks in there. [17:01:53] s/disassembles/decompiles/ [17:02:20] Yay people not using something sane like xpath... [17:02:51] boo people randomly hacking SAX parsers without understanding what they are really doing [17:18:34] Nikerabbit: Remind me, what's the appropriate way to change a message key so translatewiki picks it up correctly. Specifically, the "mwoauth-grant-checkuser" message in Extension:WikimediaMessages needs to be renamed to "grant-checkuser" thanks to Ida2b6861. [17:19:11] anomie: only change key, not content, and CC Raymond for the patch [17:19:29] and only for english and qqq [17:19:40] (qqq if banana linter complains) [17:19:57] It doesn't have a qqq at the moment, so we'll see what happens. Thanks! [17:22:02] Nikerabbit: What's Raymond's gerrit name? [17:22:09] Raimond [17:45:22] anomie, tgr: should I ask nicely if we can deploy https://gerrit.wikimedia.org/r/#/c/270240/ today? [17:45:43] bd808: Probably wouldn't hurt to do so. [17:46:23] As long as we keep in mind we have no idea what the base rate is. [17:48:43] agreed [17:49:18] Looks like petan is cleaning up lots of huggle auth code now -- https://github.com/huggle/huggle3-qt-lx/commit/406c2a68a977b2b18606a87e34f9e20031cccf52 [17:49:22] \o/ [20:37:52] Anyone want to merge the net_smtp bump patch to drop PHP 4 support? :P [20:38:21] pear/mail even [20:38:22] ffs [20:51:56] bd808, tgr: For the new session-ip log channel, did we forget to update the configuration to have it actually be logged anywhere? [20:52:22] we probably did :/ [20:52:35] although the main log file should still have it [20:53:16] only on mw1017 [20:53:40] we don't log channels that aren't explictly named [20:53:53] so yes we need a config change to log session-ip [20:54:13] I got a bit distracted by breaking all the wikis followed by my 1:1 with Toby [20:54:57] can one of you prep the InitializeSettings patch to enable the session-ip channel? [20:55:06] doing [20:55:13] should I also tweak the limits? [20:56:06] it would be nice if sync-file would accept multiple files as arguments [20:56:09] dunno. the current limits should give us some intial data right? [20:56:27] some, certainly [20:56:50] would your inclination be to lower or raise them intially? [20:57:58] I don't really have one, just asking :) [20:58:34] 5 per sessions seems a bit high unless there are internal IPs that pollute things [20:59:39] I thought of a user with two devices passing some kind of network boundary, but thinking about it that does not feel too realistic [20:59:53] The current limits will give us anything >1 at the info level, which gets logged (I think?). Further tweaking without any idea what the base rate might be to differentiate info vs warning seems pointless. [21:00:03] 10 per user... no clue [21:00:43] I put warning level in the patch, I can change that to info [21:02:34] * anomie notes that unless we're ignoring Tool Labs, AnomieBOT is probably going to easily hit the per-user count, and the per-session too once it starts using OAuth so all 7 threads share one session. [21:03:01] err, s/threads/processes/ [21:03:26] do they run on different vms? [21:04:52] Tool Labs has a pile of different VMs for the grid engine. https://tools.wmflabs.org/?status has a summary of what jobs are running where. [21:15:55] we can filter out tool labs ips if that becomes too spammy [21:16:20] they can be probably filtered out in logstash too [21:16:41] tgr: config change LGTM. can you or anomie take care of syncing? I need to eat and I already broke all the wikis once today [21:17:26] tgr: You want to do it, or do you want me to? [21:17:32] what, no "I broke Wikipedia twice on the same day" T-shirt? [21:17:43] I'll do it [21:17:47] ok [21:18:47] bd808: I was tempted to put "Give Bryan the T-shirt" at https://wikitech.wikimedia.org/wiki/Incident_documentation/20160212-AllWikisOutage#Actionables, but you made the shirts ;) [21:33:08] https://logstash.wikimedia.org/#dashboard/temp/AVLXZjJkptxhN1XaE6CO [21:34:32] .... "Same session used from 123 IPs", looks like all in the same /23. I think that answers the question about whether IP-hopping ISP proxies still exist. [21:39:37] "2A03:2880:3010:7FF5:FACE:B00C:0:1" -- might need to be adjusted for ipv6 [21:40:59] Like to only care about the /64 [21:41:39] my laptop is active on 7 ipv6 addresses in its /64 right now [21:41:45] do you think it's worth the effort to log the max IP distance, or the minimum block size that covers all ips, so logstash can sort based on that? [21:42:22] that might be nice [21:42:35] as long as it's relatively cheap to compute [21:42:54] also would be nice to flag v4, v6 and mixed I think [21:44:29] convert to binary and find the length of the common prefix? [21:44:37] that's pretty trivial perf-wise [21:45:00] not sure if it's the best way to measure "distance" [21:46:35] Looking into one of the ones hitting enwiki with around 11 IPs, I spot-checked the IPs and they're mostly blocked as open proxies. So either it's tor, or some weird kinda-like-tor that uses open proxies. [21:50:13] Wow, that data is fascinating... [21:52:55] Kangaroot is the new AOL... [22:07:50] [13:39:38] "2A03:2880:3010:7FF5:FACE:B00C:0:1" -- might need to be adjusted for ipv6 <-- I hope that IP goes back to Facebook. [22:08:31] bd808: there's a bug somewhere about treating all IPv6 in the same /64 as the same [22:08:41] legoktm: 2a03:2880::/29, IE-FACEBOOK-201100822, Facebook Ireland Ltd [22:09:02] legoktm, just go ahead with your change!!! I'll merge it then strategically merge a couple of changes on top of it to make it non-revertable. AND THEN WE CAN ALL GO DRINK MANGO LASSI!!! [22:09:04] ha :D [22:09:23] ^ to anomie [22:10:38] * anomie wants to go get food, but "anomie going to get dinner on Friday" has historically been the trigger for something blowing up and SessionManager getting rolled back [22:11:23] I think it's safe now, anomie :) [22:11:23] anomie: obvioulsy you need to fast until Tuesday [22:11:50] MaxSem: lol mango lassi sounds good [22:12:35] I was going to wait for Daniel to respond before actually doing it [22:14:34] it's friday night for him [22:14:56] just surprise him [22:15:08] tgr: are you available to review https://gerrit.wikimedia.org/r/270417 ? this makes NetSpeed=B only strip srcset, rather than strip srcset + qlow. I'd really like Jon K's assessment to be based on that. [22:15:38] there will be pages that are cached with both, but as long as we tell him to log in, it shouldn't matter [22:17:00] or jdlrobson ^ [22:17:48] im fine with that [22:18:13] jdlrobson: CR? [22:18:15] ori: well netspeed cookie could have a third value... [22:18:33] unless we're sure we're gonna discard this [22:18:42] I think so [22:19:22] ori: did you test it? I can't be bothered to fiddle with my mediawiki instance right now to try it out [22:19:27] yes [22:19:36] i'm doing lots of mobilefrontend surgery ;-) [22:20:41] I did [22:21:29] thanks [22:22:12] Attempt at a dashboard for multiple ips -- https://logstash.wikimedia.org/#/dashboard/elasticsearch/session-ip [22:25:00] "Same session used from 105 IPs" wtf? [22:25:03] cool [22:25:27] is't an anon session and the ips are in the same class B [22:26:38] 105 ips within 10 minutes is an impressive amount even if you get a new IP with each request [22:29:00] New winner "Same session used from 302 IPs" [22:29:44] another anon and class B [22:29:44] we should probably add an anon/logged in flag because even if this is a bug, which it very likely isn't, there is very little security impact to mixing up anon sessions [22:29:53] yeah [22:30:26] I think the ips list should just be a list too instead of a hash. It will be nicer to do things with in logstash [22:31:34] yeah, that was just an oversight [22:33:38] I found some more magic in the dashboard too. "top" lists are now based on max(count) in time window [22:34:17] AnomieBOT has a lowly 7 ips vs teh 34 ip leader [22:35:35] If we fix the ips value to be an array then we can do top ip too