[00:04:37] Analytics-EventLogging, MediaWiki-extensions-WikimediaEvents, Performance-Team, Patch-For-Review, WMF-deploy-2016-03-08_(1.27.0-wmf.16): Convert WikimediaEvents statsv.js to use sendBeacon - https://phabricator.wikimedia.org/T112843#2125029 (Krinkle) Open>Resolved [00:53:02] madhuvishy: i ran some more queries and the difference is quite amazing: [00:53:04] s1-analytics-slave: 17.28 sec, 12.91 sec, 13.79 sec, 18.41 sec [00:53:12] analytics-store: 5 min 2.29 sec, 6 min 31.98 sec, 5 min 29.11 sec, 4 min 58.37 sec, 10 min 54.13 sec [00:53:16] woah [00:53:17] hmmm [00:53:27] something must be up [00:53:44] (same query for ten different hours in MobileWebSectionUsage_14321266 ) [00:54:59] hmmm [07:04:45] * elukey dances after reading - varnishkafka works [08:41:17] headsup, I'll reboot stat1002/stat1003 in 20 minutes [09:05:11] moritzm: I'll start the hadoop cluster afterwards [09:07:25] ok [09:10:57] stat1002 back up with new kernel, stat1003 next [09:13:52] and stat1003 also back up [09:28:20] joal: o/ let me know when you are ready for the hadoop restarts [09:29:47] \o elukey :) [09:29:50] Ready :) [09:30:11] going to grab a coffee :) [09:30:18] elukey: We need to stop puppet on an1027 and stop camus, and then finally wait for the last run to finish :) [09:30:28] sure elukey [09:30:44] ah nice I'll do it now! [09:30:59] #libreidea is looking for voluntary devs to help us on idea creation tools - join us! - libreidea.org [09:31:16] elukey: for the three instances of camus (webrequest, eventlogging and mediawiki) [09:32:45] !log disabled pippet on analytics1027 and stopped camus via crontab [09:32:57] aahhahah non pippet [09:33:00] sigh [09:33:55] joal: I disabled all the camus jobs, there was one also for event bus [09:34:05] better safe than sorry [09:34:06] :D [09:34:23] elukey: good call :) [09:34:28] Forgot about that one [09:34:42] anyhow, the next step is to monitor Yarn and then wait right? (/me learning from the hadoop master) [09:36:07] correct elukey :) [09:36:28] elukey: you can go and grab your coffee ;) [10:08:02] elukey: We are good to go ! [10:09:58] joal: I was checking yarn but I am a bit puzzled [10:10:07] I can't find any running jobs and the scheduler looks weird [10:10:15] ? [10:10:23] No running job is normal :) [10:10:38] Scheduler is good for me, what happens? [10:10:58] mmm can't see any colors, progression bars, etc.. all is grey. Do you see running jobs? [10:11:08] No, it's expected :) [10:11:23] Grey means empty- which is what we want [10:12:09] but no process at all running? My oozie job for example, or other ones that are not camus, etc.. [10:12:15] maybe I am missing something [10:12:37] everything (except for your oozie job) has dependency on camus imported data [10:12:47] And I suspended you oozie job in order for it not to fail [10:12:58] Except also user jobs, but currently none [10:13:19] ahhhhhhhhhhh [10:13:38] How did you suspend the oozie job? Manually via oozie cli? [10:13:53] Via hue, but can be done either [10:15:40] okok I need to learn Hue then, thanks :) [10:15:42] anyhowww [10:15:48] starting with the reboots [10:15:52] k [10:15:54] :) [10:20:21] (PS1) Mforns: Add labels to the hierarchy sunburst [analytics/dashiki] - https://gerrit.wikimedia.org/r/277733 (https://phabricator.wikimedia.org/T124296) [10:26:01] (PS2) Mforns: Add labels to the hierarchy sunburst [analytics/dashiki] - https://gerrit.wikimedia.org/r/277733 (https://phabricator.wikimedia.org/T124296) [10:35:21] Analytics, DBA: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2125686 (jcrespo) a:jcrespo>None @Aklapper I am not working on this, but people that said they are not going to work on this (Analytics: Radar) keeps assigning it to me. Please mediate. [10:36:47] Analytics-Kanban: Add AQS endpoint for uniques - https://phabricator.wikimedia.org/T129518#2125696 (JAllemandou) [10:37:01] Analytics-Kanban: Add AQS endpoint for uniques - https://phabricator.wikimedia.org/T129518#2107979 (JAllemandou) a:JAllemandou [10:37:39] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Remove autoincrement id from tables [5 pts] - https://phabricator.wikimedia.org/T87661#2125703 (jcrespo) [10:37:41] Analytics-EventLogging, Analytics-Kanban, DBA, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2125700 (jcrespo) Open>Resolved a:jcrespo>None @aklapper I am not working on this, but people keep assigning it to me.... [10:38:33] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Remove autoincrement id from tables [5 pts] - https://phabricator.wikimedia.org/T87661#996313 (jcrespo) [10:38:35] Analytics-EventLogging, Analytics-Kanban, DBA, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2125704 (jcrespo) Resolved>Open [10:40:55] (PS3) Joal: Add cassandra unique devices load job [analytics/refinery] - https://gerrit.wikimedia.org/r/277507 (https://phabricator.wikimedia.org/T129519) [10:42:11] ---^ oooozieeeee [10:42:30] Ohhhhh yeah :) [10:47:22] joal: analytics103* rebooted (stopped daemons then reboot - journal node 1035 too) [10:47:30] k [10:48:47] everything seems good, I'll proceed with the other ones [11:08:48] (PS1) Joal: Correct pageviews endpoint doc spec [analytics/aqs] - https://gerrit.wikimedia.org/r/277740 [11:14:52] (PS4) Joal: Add cassandra unique devices load job [analytics/refinery] - https://gerrit.wikimedia.org/r/277507 (https://phabricator.wikimedia.org/T129519) [11:28:45] (PS1) Mforns: Highlight greatest leaf of hierarchy by default [analytics/dashiki] - https://gerrit.wikimedia.org/r/277746 (https://phabricator.wikimedia.org/T124296) [11:31:37] (CR) Mforns: Highlight greatest leaf of hierarchy by default (3 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/277746 (https://phabricator.wikimedia.org/T124296) (owner: Mforns) [11:35:53] joal: all the nodes have been rebooted, services are up and running. Now I need to proceed with 1001/2 doing the failovers [11:36:14] ok elukey, please do :) [11:42:44] !log Yarn/HDFS master node failover from analytics1001 to analytics1002 [11:45:22] !log rebooted analytics1001 [11:58:36] !log Yarn/HDFS master node failover from analytics1002 to analytics1001 and reboot of analytics1002 [12:09:44] joal: all done, if you are ok I'd re-enable camus [12:13:22] (brb lunch, I'll re-enable them as soon as I am back) [12:24:06] a-team, I'll leave for a couple hours, and be here at standup, see you... [12:24:15] see you mforns :) [12:24:40] elukey: whenever you want, just let know I'll have a neye on it [13:11:34] wow, the SLA is really working :) [13:11:38] :) [13:11:52] Just realized that I forgot to pause the job :S [13:12:01] Doing now, sorry for the spam a-team [13:15:37] !log camus and puppet re-enabled on analytics1027 [13:15:59] all right all the cluster rebooted [13:16:23] joal --^ [13:16:26] Thx elukey [13:16:30] noticed :) [13:17:33] now I really want automatic failover for the hadoop namenode [13:17:34] :D [13:17:43] :) [13:31:16] ahhh tested varnish-kafka as ottomata suggested in mediawiki vagrant and it works [14:04:16] o/ joal [14:04:41] Was wondering if my notes on the pageview stuff was what you were looking for or if you'd like me to take another look today [14:05:09] Hi halfak, didn't noticed you had modified [14:05:14] * joal reads [14:05:46] I didn't realize that you had that whole analysis document for me to read through. it was very helpful. I made some edits to it and the writeup. Please review those too :) [14:05:55] * halfak tries to apply BOLD [14:19:34] halfak: Comments make a lot of sense :) [14:20:01] \o/ [14:20:19] halfak: Will comment again, add info when needed, and will let you know for fianl check ? [14:21:15] joal, sounds great. :) [14:24:12] Thnks a lot halfak :) [14:24:26] elukey, ottomata: puppet run on analytics1026 fails due to broken hfds mount, could you have a look? [14:30:50] moritzm: cool, fixing. btw https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_.2Fmnt.2Fhdfs [14:56:25] (PS1) Joal: Extract reuasable function in lib/aqsUtil [analytics/aqs] - https://gerrit.wikimedia.org/r/277781 [15:01:43] ottomata: sorry I just seen the ping for hdfs! [15:05:33] np! [15:18:15] ottomata: I added some extra checks for varnishkafka (for example exit when the shm handle becomes stale due to varnish restart) and tried it as you suggested with mediawiki vagrant and kafka [15:18:43] Ema is porting the last python script and then we should be ready for "Real" testing [15:19:40] niiice [15:19:41] cool [15:20:03] ottomata: how do we get ssh access to 1001 to do wikistats work cc mforns ? [15:20:42] hi nuria yesterday you asked about the granularity of the reports? they are weekly right now, but adding monthly reports would be easy. [15:22:15] mforns: k, I talked with milimetric about this yesterday and after a bit of back and forth we agree to deploy this to labs and talk to erik Z to see how he wants to deprecate his reports , we can always "insert" the report into wikistats [15:23:19] (PS1) Joal: Add unique_devices endpoint to AQS [analytics/aqs] - https://gerrit.wikimedia.org/r/277784 (https://phabricator.wikimedia.org/T129518) [15:23:39] nuria: access request! [15:23:56] ottomata: ok, will file one for mforns and myself [15:23:58] statgroup statistics-web-users [15:24:00] group statistics-web-users [15:24:14] nuria: what work do you need to do there [15:24:15] just curious [15:24:34] ottomata: we will be needing to deprecate out some of wikistats [15:24:45] ottomata: but we need to talk to erik z about it before [15:25:14] ottomata: does that make sense? or is there anywhere else that wikistats gets deployed to? [15:26:01] no that's it, but i think its deployed via rsync from stat1002 [15:26:04] maybe.... [15:26:13] not sure abou tthat [15:27:06] (CR) Mforns: Add the query folder as the last parameter of scripts (1 comment) [analytics/reportupdater] - https://gerrit.wikimedia.org/r/276758 (https://phabricator.wikimedia.org/T127326) (owner: Mforns) [15:27:12] ottomata: k, will ask erik [15:28:58] Analytics: Deprecate reportcard - https://phabricator.wikimedia.org/T130117#2126320 (Nuria) [15:32:29] nuria: standup? [15:32:46] elukey: SORRY! [15:35:56] (CR) Mforns: Make reportupdater support removing columns (1 comment) [analytics/reportupdater] - https://gerrit.wikimedia.org/r/277215 (https://phabricator.wikimedia.org/T127326) (owner: Mforns) [15:57:12] (CR) Mforns: "Nuria, in the end we decided for absolute numbers, because they make it possible to aggregate in time. And then the hierarchy visualizatio" [analytics/reportupdater-queries] - https://gerrit.wikimedia.org/r/276763 (https://phabricator.wikimedia.org/T127326) (owner: Mforns) [16:01:25] a-team: do you mind if I play with the hdfs master node config in labs? [16:01:38] I want to try the manual failover tomorrow [16:01:41] elukey: good for me [16:01:49] elukey: why not [16:01:51] we're talking about making deployment to the cluster easier, elukey [16:02:09] so I would have guessed you'd be interested in joining [16:02:17] elukey: sure [16:02:18] btw [16:02:24] the a101, etc. instances in the analytics project [16:02:28] are more for cluster dev [16:02:30] if you want to test there [16:02:34] but in deployment-prep is fine too [16:02:45] but +1 on the hdfs master node config [16:29:42] Analytics: get jenkins to automate releases - https://phabricator.wikimedia.org/T130122#2126478 (Milimetric) [16:30:27] Analytics: get jenkins to update refinery with deploy of new jars - https://phabricator.wikimedia.org/T130123#2126492 (Milimetric) [16:30:55] Analytics: get jenkins to update refinery with deploy of new jars - https://phabricator.wikimedia.org/T130123#2126492 (Milimetric) [16:32:40] Analytics: Make deployment process to the cluster easier, more streamlined {hawk} - https://phabricator.wikimedia.org/T129253#2099799 (Milimetric) I broke this down into two subtasks as noted on the etherpad during our meeting: Issues with Release - need to move jars from archiva to refinery repository-> ca... [17:00:38] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2027714 (GWicke) My preference would be to find a new primary URL, and alias / redirect requests in the transition period. Possible alternatives that don't seem to be blo... [17:14:48] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2126694 (GWicke) Closely related: {T119094} Now could be a good time to consider the layout with per-project entry points. [17:17:03] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2126709 (GWicke) [17:22:20] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2126725 (MusikAnimal) [17:31:57] nuria, do you want to work on deploying the browser dashboard today? [17:32:09] mforns: yes, let's do that before anything [17:32:15] ok [17:32:23] mforns: did you tested all chnages are working, let me see if reports are created [17:32:26] to the batcave? [17:33:26] nuria, I'm looking at that [17:33:57] reports are not being created yet, because we should remove the reportupdater history file, to force an execution, otherwise it will wait until next monday [17:34:32] mforns: let's do that [17:34:38] so we can test properly [17:34:49] mforns: let's remove the history file that is [17:34:56] nuria, doing [17:35:54] nuria, done, in 55 minutes cron will run reportupdater and the backfilling will start [17:36:12] or I can force a run [17:36:28] mforns: let's force a run to make sure it is all square [17:36:31] will do [17:37:58] madhuvishy: ahem... I have asked this before but .. how do you go about creating a dns endpoint like browser-report.wmflabs.blah? [17:38:24] nuria: on wikitech, manage web proxies - choose the right project and set it up [17:41:11] madhuvishy: thnk you [17:42:14] joal: this maven release for ua parser is annoying. I'm not sure what is going on with the jars. Can you look at it with me sometime? tomorrow is fine too [17:44:35] nuria, there are some query scripts that are not executable, will create a patch [17:44:41] mforns: k [17:44:51] mforns: I shall CR [17:44:55] sure [17:47:38] (PS1) Mforns: Change mod of browser query scripts to executable [analytics/reportupdater-queries] - https://gerrit.wikimedia.org/r/277818 (https://phabricator.wikimedia.org/T127326) [17:47:44] nuria, ^ [17:49:18] (CR) Nuria: [C: 2 V: 2] "Looks good, just checked permits when fetching." [analytics/reportupdater-queries] - https://gerrit.wikimedia.org/r/277818 (https://phabricator.wikimedia.org/T127326) (owner: Mforns) [17:49:33] I lost the first battle against the hdfs namenode, tomorrow I'll win [17:49:36] :) [17:49:40] going offline folks! [17:49:45] talk with you tomorrow! [17:50:07] elukey, night! [17:59:08] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2126913 (Milimetric) Hm, good suggestions, @GWicke, let the bike-shedding begin :) /metrics/views/ That would let per-article, aggregate, and top fit nicely into tha... [18:00:30] nuria, the reports are running (backfilling since 06/2015) there are 9 reports each will take ~45 mins to backfill completely [18:02:22] mforns: ok, let me test locally [18:02:42] tomorrow morning I'll check they are all OK, but the tests I did in stat1002 yesterday generated 2 months of correct data [18:05:03] nuria, if you want the data I used to test the dashboard, it is in stat1002:/home/mforns/reportupdater-output [18:06:27] mforns: we shoudl have data in an hour right? [18:06:31] *should [18:06:49] nuria, we should have the first report in less than 1 hour [18:06:55] but there are still 8 more reports [18:07:10] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2126939 (MusikAnimal) Just going to state my opinion here again, that I worry about the potential of these proposed names getting blocked at some point. For instance `/tra... [18:07:46] mforns: ok, let's test then, do you want to create proxy in the meantime (if you are logging of let me know) on top of our current host in labs to which we deploy dashiki? [18:08:35] I have to get lunch now, but I can help if you get stuck [18:08:48] just ping me and I'll watch this [18:09:13] nuria, another thing that we need to do is configure rsync to copy the report files to 1001 [18:09:25] we could do this manually for now and have the test data there [18:10:09] mforns: why they need to be copied, are they served from there? [18:10:20] nuria, yes I think so [18:10:33] milimetric: can you confirm/ [18:10:58] milimetric: are files from reportupdater served from 1001 or 1002? i thought public endpoint was not on 1001 [18:11:31] yes, reportupdater runs on stat1002 for these particular reports [18:11:40] and there's an rsync-to parameter that's not set for this particular job [18:11:58] Analytics, Pageviews-API: Document that wikimedia pageviews API is blocked by ad blockers - https://phabricator.wikimedia.org/T126947#2126950 (MusikAnimal) [18:12:00] but it basically just syncs /a/reportupdater/output to stat1001 [18:12:11] *should sync [18:12:27] sudo -u stats rsync ... [18:12:44] nuria: datasets.wikimedia.org *is* stat1001 [18:13:05] aham, is rsync configured on a per directory basis then? [18:13:32] nuria, yes, it is a puppet option of the reportupdater class [18:13:49] but we can do a one-off sync to have the test files there [18:14:04] mforns: right, let's do both. ok, do you want to submit a patch for the rsync too? [18:14:12] nuria, sure [18:14:14] mforns: otherwise i can do it [18:14:37] I do it, file's already open [18:22:51] nuria, https://gerrit.wikimedia.org/r/#/c/277826/1/manifests/role/statistics.pp [18:22:57] also added ottomata [18:23:06] sory bad copy paste [18:23:21] https://gerrit.wikimedia.org/r/#/c/277826/ [18:23:36] (PS1) Catrope: Adjust start date for cross-wiki notifications beta feature graph [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/277827 [18:23:44] mforns: k, will let ottomata cr [18:24:01] (CR) jenkins-bot: [V: -1] Adjust start date for cross-wiki notifications beta feature graph [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/277827 (owner: Catrope) [18:24:47] (CR) Catrope: "recheck" [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/277827 (owner: Catrope) [18:28:18] nuria, I also rsync'd manually the test files into datasets.wikimedia.org [18:28:41] so now we can test with 2 months of data [18:29:18] mforns: k , let me see on my local dashiki [18:29:23] (CR) Catrope: [C: 2] Adjust start date for cross-wiki notifications beta feature graph [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/277827 (owner: Catrope) [18:29:45] (Merged) jenkins-bot: Adjust start date for cross-wiki notifications beta feature graph [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/277827 (owner: Catrope) [18:29:51] mforns: this should do it , right? "gulp --layout tabs --config SimpleRequestBreakdowns" [18:29:56] merged :) [18:30:00] running pupet [18:30:30] nuria, yes it should work [18:31:03] nuria, now I'm in doubt if the metrics path will need to include the limn-public-data prefix...? [18:31:08] let's see [18:31:19] well, are they available via http? [18:31:24] yes [18:31:31] can you access files? [18:31:33] nuria, here: http://datasets.wikimedia.org/limn-public-data/browser/ [18:33:40] madhuvishy: tomorrow I'm in transit, will not be easy [18:33:48] either tonight or friday ? [18:34:14] Anytime joal :) [18:34:30] k let's do it :) [18:34:49] In 20 minutes? Commuting now [18:35:07] hm, will be eating more less [18:35:16] Later then [18:35:21] Sure ping me after if you'll still be around [18:35:27] k [18:37:09] mforns: can you batcave for a sec? [18:39:04] nuria, sorry my internet dropped [18:39:09] mforns: np [18:39:12] try now, you should see the dashboard now [18:39:49] ya, i see it, it looks good, [18:40:09] the labels in the sunburst are missing, but they should be there in short [18:40:19] i think there are small bugs to fix but we can work on those while we deploy to labs [18:40:53] mforns: I think the dashboard has everything we need [18:41:30] nuria, ok, the small bugs you mean the labels? [18:41:37] mforns: no, ux stuff [18:41:41] ok [18:41:54] mforns: because the tabular view is below teh fold most people are going to miss it [18:42:00] mforns: same for sunburts [18:42:24] nuria, aha what you'd suggest? [18:42:31] mforns: ya.. aham, not so easy [18:42:38] mforns: but 1 by 1 [18:43:00] mforns: for the 'search as you type' box..what can you type? [18:43:09] nuria, dates [18:43:25] it's the timeseries-table visualizer [18:43:43] ah, ok, I think we also need to make that a bit more obvious, cause it could be "browsers" [18:44:12] ye [18:44:15] milimetric: are you there to batcave? about how could we not have the graphs all below the fold? [18:45:09] in cave nuria [18:45:12] k [18:45:39] mforns: can yo come to cave? [18:55:23] nuria & madhuvishy, I just finished a pass on the unique devices blog post draft. I'd like to find out how BOLD you want me to be :) [18:57:46] Analytics-Kanban: Browser visulaization in dashiki, too many items below the fold - https://phabricator.wikimedia.org/T130144#2127136 (Nuria) [19:03:12] halfak|Aspen: It has been all nuria's work (with input from comms I think) - but I say - bring it full on :) [19:03:39] OK. I'm'a cite this is if people get mad at me :) [19:04:15] I give my permission! [19:06:37] joal: I'm around now [19:06:43] Hey madhu [19:06:50] So, what's up with the thing ? [19:07:01] joal: batcave if possible? [19:07:16] sure [19:09:22] Analytics-EventLogging, Analytics-Kanban, DBA, Patch-For-Review: Add autoincrement id to EventLogging MySQL tables. {oryx} - https://phabricator.wikimedia.org/T125135#2127191 (Nuria) Removing jcrespo, Sorry, in our team we keep tickets assigned to ourselves even if it is "paused" work we are hopin... [19:13:47] Analytics-Kanban: Deploy new browser dashboard to 'browser-reports.wmflabs.org' - https://phabricator.wikimedia.org/T130069#2124345 (Nuria) a:mforns>Nuria [19:30:54] hey nuria. Do you have 15 min to chat some time today? [19:49:16] lzia: sure, now? [19:49:22] sure, batcave, nuria? [19:49:25] k [19:50:19] lzia: on batcave [19:50:37] it's telling me "joinin" nuria. give me a sec please [19:54:37] streaming: https://www.youtube.com/watch?v=Xle0oOFCNnk [19:54:53] that was for our Research Showcase :) [19:55:08] starting in 10, discussion in #wikimedia-research [20:02:09] nuria: I'm on our hangout - take your time if you're talking to lzia :) [20:02:23] madhuvishy: just finished 10 secs ago [20:08:50] milimetric: question for you: is there a plan to backfill the pageview data for the API for the time before October 2015? [20:09:16] kaldari: it's backfilled until August right now [20:09:26] kaldari: and we're going to backfill July soon [20:09:40] but beyond that we have some data quality issues, so we may not get to June and May like we planned [20:09:43] oh, awesome. I'll change the limits at https://tools.wmflabs.org/pageviews then [20:09:44] :( May [20:09:55] understandable [20:09:58] k [20:10:16] milimetric: what's the earliest date with data now? [20:10:21] August 1, 2015? [20:10:24] yes [20:10:26] cool [20:10:56] this would be a quick and easy way to check: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Selfie/daily/2015070100/2015080100 [20:40:57] madhuvishy: I have created proxy: browser-reports.wmflabs.org [20:41:04] should it be on port 80? [20:41:18] nuria: yeah [20:42:07] madhuvishy: ok, the non authorized will phase out once i deploy content I guess? https://browser-reports.wmflabs.org/ [20:42:24] nuria: yes i think so [20:42:41] Analytics-Kanban: Deploy new browser dashboard to 'browser-reports.wmflabs.org' - https://phabricator.wikimedia.org/T130069#2127467 (Nuria) Created proxy: https://browser-reports.wmflabs.org/ [22:02:15] nuria: you can also create browser-reports-test and point it to dashiki-staging - for test deploys to staging [22:33:01] madhuvishy: i did, i created both [22:33:12] ah awesome [22:33:31] nuria: you mentioned you found a parallel master for EL? which machine was it - do you remember? [22:33:51] madhuvishy: let me look at the secret graph of dbs [22:34:47] madhuvishy: https://dbtree.wikimedia.org/ [22:35:31] madhuvishy: but no, it is not there [22:35:40] nuria: yeahh [22:38:59] madhuvishy: so db1047 is teh backup master [22:39:08] nuria: aha [22:39:17] madhuvishy: 1046 is m4 [22:39:17] thats what i thought [22:39:21] madhuvishy: makes sense? [22:39:29] yes i think so [22:39:49] there is still some stuff i don't understand as to how it does the replication [22:40:06] but it mostly makes sense for why HaeB is seeing differences [22:40:09] madhuvishy: ahem there there is stuff we ALL do not understand [22:40:21] s1-analytics-slave is pointing to db1047 [22:41:03] madhuvishy: 1047 has probably many other things [22:41:13] which is different from dbstore1002 which is aliased to analytics-store.eqiad - which is the slave that is the result of the replication from eventlogging sync script [22:41:17] nuria: ya of course [22:41:56] madhuvishy: is the slave not working? [22:42:01] for our purposes though - s1-analytics-slave behaves differently from analytics-store - and i can sorta see why - in theory [22:42:09] nuria: it is - but it is much slower [22:42:32] madhuvishy: that makes sense as it is used by many other folks [22:42:37] yeah [22:42:50] and the data is off by 20% he said [22:42:54] which also makes sense [22:42:59] wait, so s1-analytics-slave is actually the backup master? [22:43:01] because of the replication script [22:43:06] HaeB: it looks like it [22:43:14] from what puppet i can read [22:43:28] i'll check with jynus tomorrow anyway [22:43:29] data "off by 20%"? [22:43:46] as in "not there yet?" [22:43:49] nuria: yeah - remember the reason why we added auto increment ids [22:44:03] because the sync script was missing data with the parallel consumers [22:44:24] madhuvishy: yes, but we switched to 1 consumer a while back [22:44:27] and our batch insertion into mysql and the sync script being timestamp based was wrong [22:44:30] as in many weeks [22:44:36] yah but HaeB is looking at old data [22:44:48] i can send some examples [22:45:23] e.g. for timestamps around november/december, the same table in store had more rows that in s1 [22:45:31] HaeB: you can create a ticket but so you know the dba is swamped , we have had a ticket with this issue assigned to him for over a month now [22:45:42] HaeB: yes, nov/dec makes sense [22:45:48] store had more than s1? [22:45:50] nuria: yes, madhu said that [22:46:10] my main concern as analyst is to get valid data ;) [22:46:21] HaeB: ^^ [22:46:31] and also because s1 is so much faster, i'd really like to use that [22:46:33] yes [22:46:45] that makes no sense [22:47:25] wait, sorry, the other way around (for december) [22:47:33] HaeB: ah [22:47:35] ok [22:47:38] that's fine then [22:47:40] for SELECT LEFT(timestamp, 8) AS date, COUNT(*) AS events FROM log.NavigationTiming_14899847 GROUP BY date ORDER BY date; [22:47:43] you should keep using s1 [22:47:53] it is correct data as far as i understand [22:48:17] analytics-store will get fixed when the dba has some time [22:48:36] as to it being slow - it's probably under a lot more load [22:49:08] HaeB: right, the main cause of it being slow is other users running long running queries [22:49:17] ... e.g. for december 3 (20151203), store sees 307440 events, s1 sees 358908 events [22:49:34] HaeB: does it get better for jan? [22:49:42] let me find when I switched off parallel consumers [22:50:20] Feb 8 [22:50:27] https://gerrit.wikimedia.org/r/#/c/269185/ [22:50:39] until then - it can have this difference [22:51:02] i expect there's still a difference, but smaller after [22:52:42] (to your question: i'm checking a different schema... this one was switched off in dec and only generated a small amount of trailing events after that) [22:53:14] HaeB: ah yes - don't worry about it then [22:59:08] madhuvishy: this one started on february 2, and the numbers are exactly identical until feb 29. from march 1 on they differ a tiny bit, to the order of 0.1% or so [22:59:09] SELECT LEFT(timestamp, 8) AS date, COUNT(*) AS events FROM log.MobileWebLanguageSwitcher_15302503 GROUP BY date ORDER BY date; [22:59:42] HaeB: aah okay [23:03:06] nuria: (slow) i see... please don't tell anyone else then, shhh :-D [23:04:12] ..seriously, faster queries can matter a lot sometimes, so i hope you folks get the hardware budget you need [23:08:51] madhuvishy: ugh, here's one appears to have stopped updating on s1 but *not* on store: [23:08:53] mysql:research@analytics-store.eqiad.wmnet [(none)]> SELECT MAX(timestamp) FROM log.NavigationTiming_15033442; [23:08:59] 20160316230346 [23:09:06] mysql:research@s1-analytics-slave.eqiad.wmnet [(none)]> SELECT MAX(timestamp) FROM log.NavigationTiming_15033442; [23:09:13] 20160306063639 [23:09:21] strange [23:09:35] HaeB: hmmm - that's out of my expertise [23:09:51] (that schema appears to have switched to a new version around march 11) [23:09:52] strange indeed [23:10:06] yes, just apropos replication issues [23:10:11] why would the store even have data then [23:10:21] exactly [23:10:27] may be clients sending old schema events [23:10:38] oh yes, that seems to happen quite often [23:10:47] cached pages or such [23:10:51] yeah [23:10:54] saw it happen in other mobile web schemas [23:10:59] hmmm [23:11:03] but then those should be on s1 too, no? [23:11:08] so the backup hasn't caught up? [23:11:09] don't know [23:11:53] anyway, i don't want to dig too deeply into this right now either ;) I'll try to write up the discrepancies in one ticket on phab [23:12:04] ...and for now assume that s1 has the more valid data [23:12:19] thanks a lot for clearing that up [23:14:23] HaeB: np. thanks for helping figure such problems out!