[08:49:29] qchris: hola, staging seems to be kaput , can you ssh into it? [10:21:46] so how's everything looking nuria/qchris? [10:22:05] milimetric: I haven't heard anything else from nuria. [10:22:08] The backup works well [10:22:12] woohoo! [10:22:13] :) [10:22:19] there is only 1 thing [10:22:27] and that is the redis restore [10:22:42] you can see dev and staging [10:22:48] i just restored [10:22:52] dev from staging [10:23:02] ok, checking [10:23:04] also deployed master to both.. run alembic [10:23:07] yara yara [10:23:36] I also made a bunch of reports on staging so size of redis db [10:23:38] was 3 G [10:23:43] sorry 3M [10:25:09] i'll check the db directly, the authorization problems keep me from being able to test in the UI [10:25:54] no you can [10:25:59] if you trick oauth [10:26:07] just in teh callback url [10:26:12] put dev instead of staging [10:26:19] (in teh page where you get a 500) [10:26:31] huh [10:26:51] oauth will error with a 500 [10:26:53] in dev [10:27:01] and try to send you to staging url [10:27:08] and you will get a 500 [10:27:11] funny :) [10:27:19] yeah, it works [10:27:23] but if in that page you "type" dev url and reload [10:27:27] teh callback will work [10:27:35] makes sense? [10:27:57] let me turn redis off and override teh db again [10:28:04] i think i forgot to shut it down [10:31:38] ^milimetric: now redis works [10:31:44] checking [10:31:51] let metry to make new reports to make sure everything is in order [10:33:54] mmm.. there is an error on queue log regarding redis.. loooking [10:34:40] ah no wait, i did not turn off the queue [10:34:41] looks like just a disconnect fluke due to probably you restarting it [10:34:47] i bet that is what it is [10:34:54] let's do it again [10:38:48] ya, working. [10:39:10] ^qchris, milimetric [10:39:21] so i have tested the backup a bunch today [10:39:44] I thought there were issues with redis but no, it works fine [10:39:46] i was watching the log [10:39:49] looking good to me [10:39:59] i will document how to do the restore here: https://www.mediawiki.org/wiki/Analytics/Wikimetrics/FAQ [10:40:33] if that sounds good [10:41:01] time it takes to do backups seems pretty flaky though [10:41:34] with a 3M redis file oscilated from 3 secs to less than 1 [10:42:04] :) [10:42:17] anything within like 10 hours sounds good to me nuria [10:42:42] ya, but the hourly one, though [10:43:38] the history on backups and wikimetrics is, I lost everyone's reports from the beginning of time once (a few months worth of them) and literally not a single person even cared [10:44:07] as long as the database is relatively stable (which it is because we have daily backups now) [10:44:10] we're good [10:44:15] ya, but when we have a dashboard updated daily it might matter a little more... [10:44:41] yeah, re-populating the public files would be a pain, but those are backed up too [10:45:07] but you're right, and it seems like the new system is pretty fast [10:46:26] ya, it is i tested locks by hand puting sleeps [10:47:16] but other than documenting the procedure I think is ready to go [10:47:54] I installed an rdb client in staging to look at redis file if : https://github.com/sripathikrishnan/redis-rdb-tools [10:48:38] ah, cool tool [10:56:35] ok, so I'm going to mark this resolved since it was tested in staging and we'll have to wait for andrew to merge it [10:57:47] we can resolve once andrew merges it no? [10:59:19] um, as far as I'm concerned if it works in staging it's ok to close it [10:59:40] it'd be nice to have a "deployed" status in bugzilla [11:00:03] but since we don't deploy to production before our sprint is over, we have to resolve before we deploy [11:00:40] so, I see two more easy pickings that we can probably finish today [11:00:49] https://bugzilla.wikimedia.org/show_bug.cgi?id=66439 [11:00:55] https://bugzilla.wikimedia.org/show_bug.cgi?id=66290 [11:01:56] ah, I see that you merged this one nuria: https://gerrit.wikimedia.org/r/#/c/138031/ [11:02:09] cool, did you happen to test it in staging? (I will if not, no prob) [11:02:28] i tested it a bunch locally on fri [11:02:38] but no, i did not test it on staging [11:02:44] k, i'll do that quick [11:03:56] nuria you have some local changes to wikimetrics/configurables on staging, any reason to keep? [11:04:07] it's the error loggin [11:04:12] from apache [11:04:30] if you remove it you will have no errors [11:04:31] yep [11:04:39] ok, keeping [11:05:14] oh, staging's at latest already :) [11:14:01] ya, and dev [11:14:09] i had deployed to both [11:14:32] qchris, milimetric: rough documentation on backup restore: https://www.mediawiki.org/wiki/Analytics/Wikimetrics/FAQ#Restore_from_backup [11:14:59] * milimetric reads [11:18:49] It seems the wikimetrics directory is moved aside, but then, the cp puts backup's wikimetrics directory into the "public" directory. [11:19:16] Should we only move the public directory and then cp that part from the backup there, or [11:19:42] move the whole wikimetrics directory (as current code on wiki) and then bring wikimetrics as a whole into place again before [11:19:45] yeah [11:19:52] first one [11:19:53] bringing the backed up files into the public diroctery. [11:19:54] just public dir [11:20:02] Think so too. [11:21:21] ok, refresh [11:21:49] The crontab lists geoip update. Do we need it (looks like entry from labs default crontab) [11:22:16] yes, no, i'll remove [11:22:19] Should "cp -r ~/backup/var/lib/ /var/lib/wikimetrics/public/" [11:22:26] be "cp -r ~/backup/var/lib/public /var/lib/wikimetrics/public/" [11:22:36] (public appended to source) [11:22:49] that looks like just the home directory where someone would untar the backup [11:22:55] but is it mentioned above? [11:23:27] yes, it looks like ~/backup is the place the tar file has been extracted from, but [11:23:37] should we copy [11:23:39] ~/backup/var/lib/ [11:23:46] onto /var/lib/wikimetrics/public/ [11:23:52] or instead copy from [11:23:56] ~/backup/var/lib/public [11:24:20] (So which is the correct source for the copy ... ~/backup/var/lib/ or ~/backup/var/lib/public ) [11:25:01] i don't know how the backup stores the folder structure [11:25:13] it looks like an assumption not handled here [11:25:37] it seems to me it would be ~/backup/var/lib/wikimetrics/public if anything [11:25:57] Thought so too. [11:26:02] ok, i'll change to that [11:26:32] Thanks ... (if only wikis allowed concurrent editing :-) ) [11:26:35] I will be off for the next couple hours, will be online before stand up [11:27:25] k. Have fun nuria! [11:27:31] Thanks for testing restoring. [11:27:31] ok, cool, see you then [11:27:33] yea! [11:27:34] awesome [11:27:46] i'll improve my other patches, hopefully we can merge those later [11:27:59] qchris: does it look good now? [11:28:06] I know the docs say to stop apache ... but thinking about it, should we stop wikimetrics-web too? [11:28:17] feel free to edit from now on, I"m working on a patch [11:28:24] milimetric: Definitely. There are enough pointers so we can handle restor. [11:28:28] wikimetrics-web wouldn't be on in prod [11:28:36] Oh. Ok. [11:29:04] the puppet basically either sets up apache or an upstart script that starts the flask dev server [11:29:40] (PS3) Milimetric: Don't show invalid cohorts on report screen [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140823 (https://bugzilla.wikimedia.org/66290) [11:29:42] Right. Thanks. [11:33:06] (CR) Milimetric: "addressed the bad pattern in the latest patch, thanks" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140823 (https://bugzilla.wikimedia.org/66290) (owner: Milimetric) [11:33:13] (PS2) Milimetric: Ensure wiki cohorts work [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140830 (https://bugzilla.wikimedia.org/66290) [13:32:52] Qchris, milimetric thanks for editing the backup docs [13:33:09] thanks for writing them :) [13:34:10] nuria: Reading logs I saw you pinged me about staging while I was offline. [13:34:29] feel free to re-ping when I get online, as I need not see the offline pings [13:34:31] I think I clog up staging scheduling reports.. which... [13:34:40] ahem .. is kind of worrisome [13:34:55] but it might be that staging ( that has debug=true) [13:34:58] In this case, I guess the issue was resolved afterwards. If not, please let me know. [13:35:11] was outputing too much to std out [13:35:23] so ya, it resolved, no worries [13:37:55] it's true that we haven't load tested wikimetrics though [13:38:08] we did some basic "will it upload 300,000 users? yes it will" [13:38:28] but, yeah, we can do complaint driven performance analysis [13:39:11] nuria: did you see my fix for the other patch you reviewed? [13:39:42] no, not yet. will do [13:39:55] ya cohort uploads are not going to be a problem [13:40:15] reports with tons of output might, but we can test that easily enough [13:41:24] that's definitely a problem, and one of the reasons is this line in aggregate report that logs the output to file [13:41:43] but now that we have proper backup, we could remove that [13:41:47] and just not have redundant log backup [13:42:37] that actually sounds very good [13:43:04] i'll make a patch as soon as backup system is in prod [14:22:33] if anyone mystified by our hadoop cluster's weird reducing problem is around, there's a query that is about to choke and reset the reducers [14:22:39] tailing the log files right now could be fascinating [15:20:20] (PS4) Nuria: Don't show invalid cohorts on report screen [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140823 (https://bugzilla.wikimedia.org/66290) (owner: Milimetric) [15:22:04] holy crap [15:22:07] milimetric! [15:22:21] a hive query with the same structure as the ones that kept failing /just worked/ [15:22:22] Ironholds! [15:22:23] :) [15:22:35] that's cool [15:23:03] also, aren't you in a VW bus going cross country or something? [15:23:14] yup! [15:23:18] except it's a Toyota [15:23:31] you join the Wandering Minstrels on halfak's couch [15:23:35] in his lovely, lovely house [15:23:44] seriously, his house is the best argument for moving to minneapolis ever [15:23:46] (CR) Nuria: [C: 2] Don't show invalid cohorts on report screen [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140823 (https://bugzilla.wikimedia.org/66290) (owner: Milimetric) [15:23:55] (Merged) jenkins-bot: Don't show invalid cohorts on report screen [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140823 (https://bugzilla.wikimedia.org/66290) (owner: Milimetric) [15:24:00] and given that we went to an eatery that does good fish and chopped herring, and I'm a British Jew, that's saying something [15:24:21] Also, waterfall. [15:24:23] "Come to Minneapolis for the chopped herring, stay for the pretty houses that don't require you to sell both kidneys" [15:24:28] the waterfall is hella-gorgeous, tis true [15:25:51] (PS3) Nuria: Ensure wiki cohorts work [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/140830 (https://bugzilla.wikimedia.org/66290) (owner: Milimetric) [15:33:36] :) yeah, Philly is definitely a one-kidney kind of town, but I miss the midwest's total lack of kidney sacrifice [17:00:15] qchris_meeting: seems that google is having network issues for me [17:00:29] Darn :-( [17:01:16] it works ok for you though? [17:01:22] Yes. [17:01:33] No robot voice or anything. [17:01:40] Video is smooth too. [17:04:57] milimetric: Anything that I can do to help you connecting/trouble-shooting? [17:05:16] I'll certainly relay thath you have connection problems, if they want to talk to you. [17:05:28] Oh .... there goes milimetric :-/ [17:26:15] (CR) MarkTraceur: [C: -1] Separate the opt-in/opt-out actions into their own tab/graph (3 comments) [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/140888 (owner: Gilles) [17:31:08] (CR) MarkTraceur: [C: 2] Calculate opt-in/opt-out totals [analytics/multimedia] - https://gerrit.wikimedia.org/r/140889 (owner: Gilles) [17:31:14] (Merged) jenkins-bot: Calculate opt-in/opt-out totals [analytics/multimedia] - https://gerrit.wikimedia.org/r/140889 (owner: Gilles) [17:33:15] (CR) MarkTraceur: [C: 2] Add opt-out and opt-in totals to graph [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/140890 (owner: Gilles) [17:34:20] (CR) MarkTraceur: [C: -1] Display opt-out totals for each wiki on a global graph (1 comment) [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/140892 (owner: Gilles) [17:35:21] (CR) MarkTraceur: "Sorry, am noob" (1 comment) [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/140888 (owner: Gilles) [17:37:15] marktraceur, it's okay, we love you anyway [17:39:58] Heh [22:27:04] (PS2) Milimetric: [WIP] Add script to bulk insert cohorts or reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/141077 (https://bugzilla.wikimedia.org/65946) [22:27:06] (CR) jenkins-bot: [V: -1] [WIP] Add script to bulk insert cohorts or reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/141077 (https://bugzilla.wikimedia.org/65946) (owner: Milimetric) [22:27:43] :( [22:27:52] stupid jenkins mumble mumble [22:27:53] :) [22:28:27] (PS2) Milimetric: Remove useless scripts [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/141076 [22:28:32] (PS3) Milimetric: [WIP] Add script to bulk insert cohorts or reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/141077 (https://bugzilla.wikimedia.org/65946) [22:36:19] I mean <3 jenkins :)