[00:49:54] Analytics: Visualize unique devices data in dashiki - https://phabricator.wikimedia.org/T122533#1968261 (Nuria) p:Triage>Normal [07:50:21] reminder: reboots of stat1002/stat1003 in 10 mins [07:54:43] Analytics-Tech-community-metrics, Developer-Relations: Mark BayesianFilter repository as inactive - https://phabricator.wikimedia.org/T118460#1968663 (Qgil) @aklapper, what about dealing with this small task during #DevRel-February-2016? [08:06:36] stat1002/stat1003 are back up with updated kernels [08:25:45] moritzm: thanks! [09:08:16] a-team: I am working on the Hadoop cluster to install the new kernel and reboot the nodes (likely happening after lunch) [09:08:34] I am not touching Master node (and its slave) plus Journal nodes [09:08:57] I'll wait for ottomata (likely failover to slave etcc..) [09:40:38] elukey: sounds great :) [09:43:29] just finished to install the new kernel to analytics*, I'll start rebooting from 13 GMT+1 onwards [09:43:40] I'll ping you before starting [09:43:54] perfect :) [09:45:23] joal, mforns: after lunch can we chat about https://wikitech.wikimedia.org/wiki/Incident_documentation/20160125-EventLogging ? [09:45:54] sure elukey [09:46:22] I am puzzled by the fact that the kafka alarms were for the root partition of the nodes, not the one that stores the logs. Plus we didn't reach 100% utilization, but only got veeery close. [09:46:37] So EL received connect timeouts and failed [09:46:54] but I'd like to know why kafka dropped some connections [09:47:08] (I am surely missing a piece because I am ignorant in the field) [10:10:38] Analytics-Visualization: Deployment caching busting strategy for dashiki - https://phabricator.wikimedia.org/T72727#1968946 (matej_suchanek) [10:10:44] Analytics-Visualization: Dashiki build needs to clean up ./dist directory before building - https://phabricator.wikimedia.org/T72845#1968954 (matej_suchanek) [10:12:21] Analytics-Visualization, Testme: Deployment caching busting strategy for dashiki - https://phabricator.wikimedia.org/T72727#1968962 (matej_suchanek) [10:12:23] Analytics-Visualization, Testme: Dashiki build needs to clean up ./dist directory before building - https://phabricator.wikimedia.org/T72845#1968963 (matej_suchanek) [10:58:11] (PS1) Joal: Typo in mobile_apps uniques daily table creation [analytics/refinery] - https://gerrit.wikimedia.org/r/266695 [10:58:37] (CR) Joal: [C: 2 V: 2] "Self merging typo." [analytics/refinery] - https://gerrit.wikimedia.org/r/266695 (owner: Joal) [12:11:59] back :) [12:43:45] a-team: I am starting the reboots [12:43:52] great elukey [12:43:53] analytics 1028/29 [12:44:09] we'll basically do [12:44:10] service hadoop-hdfs-datanode restart [12:44:10] service hadoop-yarn-nodemanager restart [12:44:13] then reboot [12:44:27] is it fine? Or should we do something different to avoid failures etc.. [12:45:14] not restart, stop [12:45:44] but please run them with "time" first so that we have an estimate whether a plain reboot would also work [12:45:53] sorry wrong copy/paste, thanks for the correction :) [12:46:50] Sounds fine by ne, you should stop yarn before hdfs [12:46:56] elukey: --^ [12:47:30] got it [13:05:48] elukey: Have you managed to shutdown yarn and hdfs nicely, or is hard-reboot needed ? [13:07:14] they shutdown nicely, except yarn that takes more than 5 seconds and then a kill -9 kicks in. Not sure if it is salt or not, I am checking. But the two nodes are up and running now. [13:08:53] elukey: maybe yarn is waiting for it's running containers to finish ... Don't know [13:09:10] ah yes confirmed, I need to specify a bigger timeout in salt [13:12:04] cool :) [13:18:54] (PS10) Mforns: Add split-by-os argument to AppSessionMetrics job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) [13:47:41] joal: the 5 seconds timeout is a Yarn parameter, by default it sends a -9 after 5 seconds [13:47:45] a bit weird [13:47:52] okey :) [13:48:05] That's no big deal, just wondering :) [13:48:20] yeah I would have liked a bit more time [13:48:31] maybe we can tune it as action item [13:48:36] I'll ask ottomata [13:55:23] a-team, I take a break, will be back :) [15:35:56] Huh, my script died. [15:38:12] Oh, stat1003 died. [15:38:25] Anyone know what happened? [15:38:54] Kernel upgrades according to SAL I guess [15:56:54] ottomata; https://github.com/wikimedia/operations-puppet/blob/production/manifests/site.pp#L113 - can I restart it? [15:57:04] do we have impala running or is it just a test node? [15:57:46] elukey: 1017 is already done [15:57:51] see uname [15:58:05] so, I tried to reinstally 1017 as Jessie back during the all staff [15:58:12] something went wrong, and i coudln't log in, but I never finsihed [15:58:22] sorry moritzm I meant 1026 [15:58:34] impala is running, but we also never finished productionizing it [15:58:38] no one is using it [15:58:57] we should look at finishing that up, probably just uinistalling it [15:59:05] it doesn't really seem necessary [16:00:21] all right so i guess I can reboot [16:02:32] right ottomata? [16:05:25] ja go ahead [16:05:38] elukey: ^ [16:08:53] already restarted thanks :) [16:24:36] ottomata: would you mind to silence all the analytics nodes from 51 to 57 in icinga? [16:24:49] I am still not able to do it :( [16:28:15] Analytics-Tech-community-metrics, Possible-Tech-Projects: Data Analytics toolset for MediaWikiAnalysis - https://phabricator.wikimedia.org/T116509#1970040 (Aklapper) @Anmolkalia: This issue has been assigned to you a while ago. Could you share a status update? Are you still working (or still plan to work... [16:34:53] milimetric: Hey, quick question about Dashiki...we have language and project, can we have other dimensions? I'm asking because we have numbers for uploads, unique uploaders, and new uploaders per-wiki, but we've also wanted to split those numbers for video/audio/image/document, and I think if we did that by just adding new metrics, it would be cluttering the [16:34:53] interface a bit [16:43:32] I probably shouldn't add those scripts yet anyway, I think reportupdater is churning a bit [16:43:42] But when it makes sense to add them... [16:44:28] Analytics-Kanban: Quaterly review 2016/01/22 (slides due on 19th) - https://phabricator.wikimedia.org/T120844#1970078 (Nuria) Open>Resolved [16:51:31] milimetric: James_F also wants to know if we can get rid of categories of metrics that we don't use, like...all of them but multimedia [16:52:10] Or maybe, James_F, we should recategorize some [16:52:22] Sure. [16:52:36] * milimetric is catching up with a lot of interruptions, one moment MarkTraceur [16:52:39] But we don't and won't have "Reader" metrics, right? [16:52:41] Oh, no problem [16:52:44] James_F: True [16:53:06] I think that comes from the Dashiki:CategorizedMetrics page, and people have built it in a way that only sort of makes sense [16:53:17] ok, so all those categories showing up is a bug, they shouldn't show if there's no metrics in them [16:53:28] OK, cool [16:53:32] I'll file that and fix it soon [16:53:35] Thanks! [16:53:46] so then what you could do is just add metrics and use the categories to make it less cluttered [16:53:55] if that doesn't work, we could do a different layout [16:54:03] Yeah, I figure I can put new uploaders into new users [16:54:11] And uploads into content [16:54:41] what I mean is, if you wanted to use the categories to your advantage you could make an "Uploads" category and under that have metrics for video, audio, image, documemnt [16:54:43] *document [16:55:01] Ooh, I like it. [16:55:09] I'm also moving around the other stuff though :) [16:55:25] yeah, as you wish, I mean, these layouts are pretty rigid on purpose [16:55:40] that way we can kind of force dashboards to look and feel the same [16:56:32] OK, James_F, we're using three different ones now, I'm going to mess with things as I go [16:56:48] I think I was misunderstanding the categories until now. [16:56:54] Analytics-Kanban: Categories without metrics should not show up {crow} - https://phabricator.wikimedia.org/T124926#1970138 (Milimetric) NEW a:Milimetric [16:57:08] MarkTraceur: ^ that's the task to fix the empty categories [16:57:13] Cool, cheers [16:57:46] Ya. [17:01:22] elukey: hola, standup? [17:01:37] sure! [17:12:50] Analytics-Kanban: Categories without metrics should not show up {crow} [3 pts] - https://phabricator.wikimedia.org/T124926#1970220 (Milimetric) [17:13:11] (PS1) Milimetric: Show only relevant categories [analytics/dashiki] - https://gerrit.wikimedia.org/r/266768 (https://phabricator.wikimedia.org/T124926) [17:13:27] (CR) Milimetric: [C: 2 V: 2] Show only relevant categories [analytics/dashiki] - https://gerrit.wikimedia.org/r/266768 (https://phabricator.wikimedia.org/T124926) (owner: Milimetric) [17:17:18] Analytics-Kanban, Research-and-Data: Research Spike: How do redirects affect pageviews [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1970258 (mforns) [17:18:32] MarkTraceur: ok, done and deployed, (might have to clear cache to get the new index.html) [17:19:05] Looks great, thanks :) [17:21:11] Analytics-Tech-community-metrics: Provide feedback about prototype "Git and Gerrit statistics" dashboard to Bitergia - https://phabricator.wikimedia.org/T124930#1970274 (Aklapper) NEW [17:44:34] Analytics-Visualization, Testme: Deployment caching busting strategy for dashiki - https://phabricator.wikimedia.org/T72727#1970354 (Nuria) This was done a looong time ago, closing. [17:44:57] Analytics-Visualization, Testme: Deployment caching busting strategy for dashiki - https://phabricator.wikimedia.org/T72727#1970355 (Nuria) Open>Resolved [17:48:32] Analytics-Visualization, Testme: Dashiki build needs to clean up ./dist directory before building - https://phabricator.wikimedia.org/T72845#1970377 (Nuria) closing. [17:49:12] Analytics-Visualization, Testme: Dashiki build needs to clean up ./dist directory before building - https://phabricator.wikimedia.org/T72845#1970379 (Nuria) Open>Resolved [18:00:31] hey madhuvishy : can you point me to your code ? [18:00:39] ottomata: hardware meeting? [18:00:42] joal: yes i'm looking for it :) [18:00:46] thx :) [18:02:02] milimetric: want to come to hardware meeting (it is ok to say no) [18:02:09] ah! sorry [18:02:31] joal: I think it's best to get from hdfs://analytics-hadoop/user/madhuvishy/refinery/oozie/last_access_uniques [18:02:46] perfect madhuvishy, will do ! [18:02:47] this is where the coordinator is running from [18:02:48] thx :) [18:08:14] hello [18:08:21] is the eventlogging slaves back up to date now? [18:11:21] jynus: ^? [18:11:41] it is getting there [18:12:03] (PS1) MarkTraceur: Add per-media-type queries for uploads [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266783 [18:12:05] I am resyncing old events [18:12:16] are new events making it? [18:12:23] I'm looking for events made in the last 24h period [18:12:38] (CR) MarkTraceur: [C: -1] "Don't merge until the deletion script is done catching up, and we have numbers for the five existing categories for all wikis." [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266783 (owner: MarkTraceur) [18:13:18] milimetric: When you get a chance I'd appreciate if you could check what reportupdater is doing now, I think it's running our queries for historical data for deletions and other wikis, but I can't be certain [18:13:55] k [18:14:22] the only tables behind are 4 right now, Echo* Edit* PAgeContentSaveComplete and MobileAppUploadAttempts [18:14:46] but as I said, there could be older events still not synced (I am doing that now) [18:15:43] YuviPanda, those should be ok according to mysql, but I do not kno if there are kafka issues or one of those 4 tables, does that help? [18:16:03] it does! thanks jynus :) [18:18:08] Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1970520 (Lcanasdiaz) I was wrong, it is a "bug" of our retrieval tool for Git (cvsanaly2) ``` mysql> S... [18:19:15] the problem with refilling is that as it happens at the same time than normal queries, it takes some time [18:31:31] YuviPanda: I *think* new events should come in [18:31:37] as long as jynus hasn't stopped the sync* job [18:31:53] nope, puppet makes sure of that [18:32:10] k [18:35:21] (PS6) Bearloga: Functions for categorizing queries. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) [18:37:45] ottomata: joal btw, update on notebooks: I fell into a yak-shaving hole, since the script docker.io uses to create debian base images is utter shit. am about 90% done with my own... [18:37:59] this is needed since this is the first time we're using docker in a production compatible way [18:38:02] should be done today, hopefully [18:43:31] (CR) OliverKeyes: Functions for categorizing queries. (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [19:00:51] (CR) Bearloga: Functions for categorizing queries. (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [19:01:26] that was a good meeting! :) [19:01:33] +1 [19:01:38] +1 [19:01:46] MarkTraceur: I looked at your files, it looks to me like they're all done [19:01:47] YuviPanda: Cool! [19:01:47] :] [19:01:53] * joal is eager to see AQS response time with SSDs :) [19:02:02] ok so, real quick [19:02:03] YuviPanda: Thanks mate ! [19:02:10] action items for me are to make procurement requests for this stuff? [19:02:11] nuria: ? [19:02:16] or am I waiting for you to confirm? [19:02:17] yessir [19:02:35] ottomata: no, i think for this year budget we can proceed [19:02:43] cool [19:02:47] milimetric: Weird, maybe there's a caching problem somewhere [19:02:47] joal: can you own the SSD hw ticket? I'll make the hive/oozie one [19:02:50] MarkTraceur: they're just waiting for this month to be finished, and when that happens they'll add the next month's results to each wiki you have configured [19:02:51] ottomata: so, yes, go ahead [19:02:57] YuviPanda: one of those days, I'll ask you to show me those script, do some docker learninf :) [19:02:59] MarkTraceur: why what are you seeing / where? [19:03:07] joal: good we decided to wait on the hive/oozie move, we should really jsut wait until we get a new node then :) [19:03:15] joal: :D madhuvishy said you already knew a lot of docker [19:03:17] sounds good :) [19:03:25] (CR) OliverKeyes: Functions for categorizing queries. (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [19:03:31] joal: if you like, i can create theticket and you can fill it in with info? [19:03:34] i'll assign it to you [19:03:36] s'ok? [19:03:41] YuviPanda: not really true, I know some, but have been through productionisation [19:03:58] YuviPanda: but have NOT been [19:03:59] sorry [19:04:03] milimetric: I don't see the deletion numbers yet [19:04:18] ottomata: which topic ? [19:04:25] ottomata: hive / oozie server ? [19:04:41] :D [19:04:43] ok [19:04:45] milimetric: Also, I need to delete the crosswiki upload numbers, but don't have the permission to do so, I changed the reportupdater script for it [19:04:48] MarkTraceur: oooh, good point [19:04:55] um... that may not run until the next month [19:05:00] because it has recorded that it's already run [19:05:01] Oh, okay [19:05:07] I'll delete its history file and make it run again [19:05:14] Neat, thanks [19:05:16] it won't recompute, it'll just run what it missed (deletions) [19:05:28] so that'll run tonight [19:05:52] ok, I'll delete the crosswiki uploads/commonswiki right? [19:07:12] ottomata: I'm lost about the ticket stuff :( [19:07:21] What should I fill in ? [19:07:49] MarkTraceur: do you have sudo -u stats? Like this: "sudo -u stats rm /a/limn-public-data/metrics/multimedia-health/cross-wiki-uploads/commonswiki.tsv" [19:07:49] milimetric: Yup [19:07:56] Oh, sure [19:08:03] (ok, I did it this time, np) [19:08:20] but it won't delete the file that's rsynched until a new one replaces it [19:08:24] * milimetric lunching now [19:08:54] * joal is going for diner [19:09:21] joal ssds [19:09:24] laters! [19:10:01] Analytics, Analytics-Cluster, Analytics-Kanban, hardware-requests, operations: New Hive / Oozie server node in eqiad Analytics VLAN - https://phabricator.wikimedia.org/T124945#1970744 (Ottomata) NEW [19:10:10] Analytics, Analytics-Cluster, Analytics-Kanban, hardware-requests, operations: New Hive / Oozie server node in eqiad Analytics VLAN - https://phabricator.wikimedia.org/T124945#1970755 (Ottomata) [19:13:01] Analytics-Kanban, hardware-requests, operations: 8 x 3 SSDs per for AQS nodes. - https://phabricator.wikimedia.org/T124947#1970805 (Ottomata) NEW a:JAllemandou [19:13:14] Analytics-Kanban, hardware-requests, operations: 8 x 3 SSDs for AQS nodes. - https://phabricator.wikimedia.org/T124947#1970814 (Ottomata) [19:13:27] Analytics-Kanban, hardware-requests, operations: 8 x 3 SSDs for AQS nodes. - https://phabricator.wikimedia.org/T124947#1970805 (Ottomata) [19:19:24] oh, something to think about: there's not much rack space for nodes in eqiad... we will see! [19:21:05] Analytics-Cluster, Analytics-Kanban, hardware-requests, operations: Hadoop Node expansion for end of FY - https://phabricator.wikimedia.org/T124951#1970876 (Ottomata) NEW [19:22:36] Analytics-Cluster, Analytics-Kanban, hardware-requests, operations: Hadoop Node expansion for end of FY - https://phabricator.wikimedia.org/T124951#1970895 (Ottomata) [19:22:56] Analytics-Kanban, hardware-requests, operations: 8 x 3 SSDs for AQS nodes. - https://phabricator.wikimedia.org/T124947#1970805 (Ottomata) [19:22:58] Analytics-Cluster, Analytics-Kanban, hardware-requests, operations: Hadoop Node expansion for end of FY - https://phabricator.wikimedia.org/T124951#1970876 (Ottomata) [19:23:00] Analytics, Analytics-Cluster, Analytics-Kanban, hardware-requests, operations: New Hive / Oozie server node in eqiad Analytics VLAN - https://phabricator.wikimedia.org/T124945#1970744 (Ottomata) [19:25:39] DO WE NEED AN ANIMAL!? [19:25:49] I guess i should use the same one from last year's expansion? [19:25:52] ORR [19:25:57] is AQS SSDs different? [19:35:20] Analytics-Cluster, operations: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1970956 (Dzahn) [19:36:58] (CR) Tjones: Functions for categorizing queries. (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [19:39:08] helo, there is an issue on stat1002 [19:39:19] it has a redis that tries to connect to the deployment server [19:39:37] but the deployment server got switched and it fails to connect to that one [19:39:45] specifically: [19:39:52] redis.exceptions.ConnectionError: Error connecting to mira.codfw.wmnet:6379. timed out. [19:40:11] i made a ticket and it might need people on the switches to adjust ACL.. just reporting for now [19:40:20] since i dont really know what that breaks [19:40:54] (CR) Nuria: "I see several things that could be improved in the code and i have listed those below but My main question (and you can update that on the" (5 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254461 (https://phabricator.wikimedia.org/T118218) (owner: Bearloga) [19:45:26] milimetric: I guess I'm also waiting for new files for wikis with uploads that aren't en, de, or commons. [19:45:36] But that might just be taking a while [19:47:59] interesting! mutante i dunno what that is for! [19:48:38] Analytics-Cluster, operations: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971034 (Dzahn) what it's doing is trying to deploy wikimedia/discovery/analytics and it can't deploy it because of the redis connection timeout. Error: Execution of '/usr/bin/sal... [19:49:47] Analytics-Cluster, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971050 (jcrespo) [19:50:05] Analytics-Cluster, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971054 (EBernhardson) https://gerrit.wikimedia.org/r/#/c/265795/ is the patch that added this, it adds a new user to analytics mac... [19:51:30] Analytics-Cluster, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971066 (EBernhardson) I would also note that this means analytics can't deploy new versions of refinery as long as mira is master... [19:52:22] Analytics-Cluster, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971075 (Dzahn) It might need #netops because ACLs on network hardware might have to be adjusted, since the analytics VLAN is separ... [19:52:37] don't know if you all had plans to deploy new refinery versions, but FYI as long as mira is deployment master can't push any trebuchet repos to analytics [20:03:33] Analytics-Cluster, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971106 (Dzahn) i checked ferm/iptables rules on tin and mira. they are the same and allow connections to 6379 (the redis port) fro... [20:03:51] Hey ottomata [20:03:54] Analytics-Cluster, netops, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971110 (Dzahn) [20:03:54] You here? [20:04:24] ottomata: Just backlogged - I'll get the SSDs tickets [20:05:07] Analytics-Cluster, netops, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971121 (Ottomata) Yeah, makes sense! stat1002 is in the Analytics VLAN, so a rule will need to be opened up in the VL... [20:05:21] joal: cool, made the ticket, you just gotta drive it with cmjohnson [20:05:23] or whoever [20:05:30] ottomata: you said in the original desc that the machines avec 12 SSDs --> I assume it'a typo? [20:05:30] I'll help with any downtime when the time comes [20:05:34] OOPs [20:05:34] yes [20:05:37] :) [20:05:45] fixing [20:05:51] I'll correct and add a comment about the SSDs spec we discussed with gwicke [20:05:55] Analytics-Kanban, hardware-requests, operations: 8 x 3 SSDs for AQS nodes. - https://phabricator.wikimedia.org/T124947#1971123 (Ottomata) [20:05:56] k [20:10:17] Analytics-Kanban, hardware-requests, operations: 8 x 3 SSDs for AQS nodes. - https://phabricator.wikimedia.org/T124947#1971137 (JAllemandou) When discussing about cassandra response time issues with @Gwicke, he told me the Services Team had used SSDs to mitigate that issue. They use Samsung 850 Pro 1Tb... [20:10:22] ottomata: commented :) [20:10:41] Sorry for not having understood what you meant with the ticket stuff :( [20:10:58] Let me know if I can help with moving the hardware stuff [20:11:02] ottomata: --^ [20:11:15] madhuvishy: heya, you here? [20:12:59] joal: yes [20:13:14] I have a question on the lst access code :SD [20:13:18] sure! [20:14:12] I don't understand the 3 cases with last_access and/or nocookies values in x-analytics-header being null or not :( [20:14:31] or more precisely: I understand the repeated one [20:15:09] joal: aha - it is pretty confusing - i drew a matrix to understand it [20:15:39] joal: [20:15:40] so [20:15:42] Ahhh, maybe because : (nocookies is null === I accept cookies) (because noccokies can only take the true value, correct? [20:15:43] Analytics-Cluster, netops, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971162 (Dzahn) fixed by @Faidon , thanks! -- confirmed working now: Package[wikimedia/discovery/analytics]/ensure:... [20:15:50] yes [20:15:58] Riiiiight :) [20:16:01] Analytics-Cluster, netops, operations, codfw-rollout, codfw-rollout-Jan-Mar-2016: stat1002 - redis can't connect to mira.codfw.wmnet - https://phabricator.wikimedia.org/T124955#1971166 (Dzahn) Open>Resolved a:Dzahn [20:16:06] nocookie can be 1 or null [20:16:11] s/1/true [20:16:33] * joal has been long to get it [20:16:44] so - all requests that come with an appropriate last access date or last access null - overcount, [20:17:20] all requests with appropriate last access date, or last access null but nocookie is null - uniques estimate [20:17:30] last-access == null can either be a (nocookie / first action in session), correct ? [20:17:34] all requests with appropriate last access date - repeats [20:17:48] joal: correct. [20:17:52] Huray [20:17:56] I think I have it [20:18:07] I'll still have to think it twice every time :) [20:18:11] yes [20:18:15] Thanks madhuvishy ! [20:18:53] np :) [20:28:18] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971222 (Jdlrobson) @nuria any chance we can bump this priority of this and associated tasks? The reading team is looking to lazy load images as part of our quarterly goal an... [20:43:42] Analytics-Kanban, EventBus, Patch-For-Review: Refactor kafka puppet roles with hiera for more generic use than analytics [8 pts] - https://phabricator.wikimedia.org/T120957#1971293 (Nuria) [20:43:44] Analytics, Analytics-Kanban, Patch-For-Review: Change analytics kafka cluster JMX metrics to be prefixed with cluster name and change alerts and dashboards [5 pts] - https://phabricator.wikimedia.org/T121643#1971292 (Nuria) Open>Resolved [20:45:47] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971294 (Ottomata) > ...but actually it seems that the newer of the two tables (MobileWebSectionUsage_15038458) has already been fully restored? I believe so. There was a l... [20:47:22] jdlrobson: i always forget, what's tilman's irc nick? [20:48:20] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971300 (Ottomata) Oh, @Tbayer says as much. I'll look into it now... [20:51:28] ottomata, haeb [20:51:52] milimetric: is there docs anywhere on how to run wikimetrics tests? [20:52:06] ok thanks Ironholds, not here then. [20:52:31] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971331 (Nuria) The table: MobileWebSectionUsage_15038458 is already there from what I can see, is that sufficient to do analysis? @ottomata: let me know if you think we c... [20:52:32] np! [20:57:20] Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016, Upstream: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1971343 (jayvdb) >>! In T123808#1970520, @Lcanasdiaz wrote: > I was wrong, it is a "bug" of... [21:01:52] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971359 (Ottomata) I think it has been restored... ``` mysql:research@s1-analytics-slave.eqiad.wmnet [log]> select min(timestamp), max(timestamp) from MobileWebSectionUsage... [21:02:19] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971360 (Ottomata) IT'S A MIRACLE!!! [21:07:55] madhuvishy: yeah, just execute "scripts/tests" or execute it and pass the path to a specific test / folder to run [21:08:12] like "scripts/test tests/test_controllers" [21:08:54] Analytics-Tech-community-metrics: Microtask: Create a very simple REST API for SortingHat - https://phabricator.wikimedia.org/T114838#1971390 (Saylikarnik) Hello,I am Sayli Karnik ,an Outreachy aspirant for the upcoming Round 12 i. I am proficient in HTML, CSS, JavaScript, PHP, Android and REST APIs. Could y... [21:09:18] but I guess there are no docs: https://wikitech.wikimedia.org/wiki/Analytics/Wikimetrics [21:09:59] the readme should probably say [21:10:05] Bye a-team, I'm off for tonight and won't be available tomorrow evening at standup [21:10:11] I'll send e-scrum :) [21:10:17] laters! [21:10:41] MarkTraceur: those other files would've run next month because reportupdater doesn't notice config changes, it only knows whether or not it ran when it was supposed to [21:11:07] Oh. [21:11:43] milimetric: Is there a way to force it, or can we fix it so it notices? [21:12:29] MarkTraceur: when i deleted the history file (./multimedia/.reportupdater.history) it means it'll get triggered at the next run [21:12:40] but the next run might be tomorrow 'cause it missed today's by just a little bit I think [21:13:00] k [21:14:20] ottomata: it's HaeB i think [21:14:33] ja thanks [21:14:41] jdlrobson: i'm checking, but it looks like both tables are restored... [21:16:09] ottomata, ^ [21:17:50] Ironholds: it was me who said it, yes! [21:18:50] ottomata, yes, I was more pointing out HaeB just joined and you were looking for him ;p [21:18:58] (I now realise some IRC clients do not include joins/parts) [21:19:13] oh! [21:19:16] thanks, didn't see that [21:19:26] HaeB: ja, it looks like both tables are restored, am I wrong? [21:19:30] it may be a miracle [21:19:32] i didn't do it! [21:20:03] ottomata: saw your comment at https://phabricator.wikimedia.org/T123595#1971360 ;) [21:20:35] yes so i noticed one of the tables appearing last week already... but was a bit suspicious still, see my comment [21:20:54] but if you think it's kosher, that works for me [21:21:35] i'm running a spark query to count the number of records in hadoop for MobileWebSectionUsage_14321266 [21:21:41] if it matches about what is in mysql [21:21:44] i'd say we're good [21:22:12] let's add The Miracle of the Two Resurrected EventLogging Tables to https://en.wikipedia.org/wiki/Category:Miracles soon [21:23:10] i can also rerun some of the queries i did earlier for MobileWebSectionUsage_14321266 and see if the results match [21:24:14] k [21:27:46] stop saying that- if you delete the tables from the slaves, they will get recreated by the resync script [21:28:14] otherwise, the resync script would not be able to sync new tables created on the master [21:28:52] if you want to delete tables for good, delete them from the master (first) then from the slaves [21:29:17] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971464 (Tbayer) Hm, but looking at the size (repeat from https://phabricator.wikimedia.org/T123595#1950564 ), MobileWebSectionUsage_14321266 has not actually been restored y... [21:29:34] ottomata: :( [21:29:56] but please, when doing schema changes, please notify the DBA [21:30:17] jynus: i did delete the tables from the master [21:30:21] and it was when you were on vacation [21:31:23] HaeB, i get a different result than you in your query [21:31:37] maybe i'm connected to a diffferent db..>? [21:31:38] hang on [21:32:12] on meeting we 'll get back to ya HaeB [21:32:38] resync has not finished, anyway [21:33:37] nuria: thanks, no rush [22:00:18] huh ok HaeB i was looking at a different db [22:00:25] s1-analytics-slave has the full table [22:00:33] wait, or does it? [22:01:07] | table_name | TOTAL SIZE (GB) | [22:01:07] +--------------------------------+-----------------+ [22:01:07] | MobileWebSectionUsage_14321266 | 46.844009399414 | [22:01:07] | MobileWebSectionUsage_15038458 | 47.451408386230 | [22:03:34] jynus: s1-analytics-slave seems to have all data [22:03:43] can I mysqldump MobileWebSectionUsage_14321266 from there into m4-master? [22:04:43] sure, although it make take a day or so, with so many syncs going on now [22:04:49] aye [22:04:54] would it be better to wait? [22:05:28] nah, run it now, nighs tend to be calm [22:05:38] k [22:05:46] shoudl I just mysqldump | mysql? [22:05:48] just, please, make sure to log all those [22:06:00] any preferred settings? :) [22:06:03] on a screen, yes [22:06:09] should I steal the ones from the el sync script? [22:06:11] --single-transaction [22:06:41] its ok [22:07:03] --skip-opt --single-transaction --quick ? [22:07:11] and mysql --compress --skip-column-names [22:07:12] ? [22:09:59] do not think too much, most of those are by default or already configured on our custom config file [22:10:15] haha [22:10:15] ok [22:10:16] so no [22:10:16] --no-create-info --insert-ignore --extended-insert --compress --hex-blob [22:10:22] i'll prep the command i'm gonna run and ask you.. [22:10:23] one sec [22:10:24] just makse sure you do not block repllication [22:10:43] I do not need to be asked for permission from another op [22:10:49] haha [22:10:51] milimetric: sorry i'm a bit confused with the tests - it assumes that the _testing databases exist - but it'll create all the tables right? [22:10:51] I just need awarenes [22:10:58] but you can advise! [22:11:03] sure [22:11:20] madhuvishy: yes, that's right, you're not confused. If you create the dbs, it'll create any tables it needs [22:11:36] milimetric: hmmm - that's not happening so far - wondering why [22:11:42] also [22:12:13] milimetric: https://github.com/wikimedia/analytics-wikimetrics/blob/master/tests/fixtures.py#L34 [22:12:37] jynus: [22:12:37] mysqldump -h db1047.eqiad.wmnet --single-transaction --insert-ignore log MobileWebSectionUsage_14321266 | mysql -h m4-master.eqiad.wmnet log [22:12:48] gonna run that on db1047 in a screen [22:12:48] milimetric: why are these wiki and wiki2? do they get translated to wiki_testing and wiki2_testing somewhere? [22:12:52] any modifications? [22:13:00] from where are you running that? [22:13:10] db1047, s'ok? [22:13:13] anywhere you like :) [22:13:26] I am asking because of the double -h [22:13:44] ja could leave the first -h off [22:13:48] drop that option one one of the places so you run it from localhost [22:13:51] k [22:13:56] socket > tcp [22:13:59] aye k [22:14:17] and log log log [22:14:22] k [22:14:26] doing it then... [22:14:43] you checked it doesnt already exist? [22:15:16] ^ottomata? [22:15:22] table does exist [22:15:27] but is missing most data [22:15:35] then needs more options [22:15:39] madhuvishy: that's... weird [22:15:40] --no-create-info [22:15:41] ? [22:16:53] hmm, jynus you sure we want --single-transaction? [22:17:18] yes [22:17:21] trust the dba ottomata :D [22:17:47] madhuvishy: I'm gonna try and focus on this layout stuff though, if you're super stuck ping me again [22:17:48] if not, use the resync sscript [22:17:55] right now I don't know more than you, I'd just dig around [22:17:58] hmmm [22:18:06] milimetric: okay - ya i have no idea what's happening [22:18:20] yeah cause i don't think we want to lock that table, as it would cause EL mysql writer to block if it gets events for that table, ja? [22:18:23] i'll poke again if i dont get anywhere [22:18:32] resync... [22:19:08] hmm jynus i could use resync script to sync from s1 to m4 master [22:19:18] just got switch around the variables, eh? [22:19:22] yes, that is another alternative [22:19:27] maybe that is safer? [22:19:48] it is the same, it uses mysqldump equially, just in smaller chunks [22:19:52] ja [22:19:55] and won't lock for hours [22:19:56] right? [22:20:21] i just see LOCK TABLES `MobileWebSectionUsage_14321266` WRITE; at the beginning of the dump [22:20:31] which sounds nasty if i'm writing to the master [22:20:43] --single-transaction or --skip-lock-tables avoids the locks [22:20:57] doesn't see to [22:21:07] the lock on the source of the backup [22:21:19] OH [22:21:24] hm no [22:21:35] its in the mysqldump output [22:21:35] i mean [22:22:12] jynus: https://gist.github.com/ottomata/fbf4086ce4919ca4398a [22:23:34] --skip-add-locks skips the locks on importing [22:23:47] checking [22:23:54] (PS1) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [22:24:01] ah that'll do it [22:24:03] that safe to run? [22:24:20] whats the final thing you want to do? [22:24:21] (CR) jenkins-bot: [V: -1] [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [22:24:40] i want the MobileWebSectionUsage_14321266 table back on m4-master, and eventually to analytics-store [22:24:49] the command line, otto [22:24:52] oh haha [22:25:00] got this atm [22:25:09] mysqldump --single-transaction --insert-ignore --no-create-info --skip-add-locks log MobileWebSectionUsage_14321266 | mysql -h m4-master.eqiad.wmnet log [22:25:18] (PS2) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [22:26:18] that should do it, put a "| pv |" between the pipes [22:26:26] oh ja, love that pv [22:26:44] that will give you an eta [22:26:50] awesome [22:27:01] (not automatically) [22:27:06] ja, but bytes sent [22:27:59] if it fails, we will have to do shorter runs or import it from a local file [22:28:27] which would not be surprising given how loaded is the whole thing right now [22:28:47] ha, ok [22:28:58] ok, going to start it [22:29:07] log log log [22:29:27] !log starting mysqldump of MobileWebSectionUsage_14321266 from db1047 into m4-master [22:29:38] wrong channel :-) [22:30:14] naw, i do things like this in both sometimes [22:30:16] we got one too :) [22:30:37] look at the XP points you gathered, though! You may be considering becoming a DBA? [22:30:45] haha, i used to sorta be one [22:30:47] de facto [22:30:48] at couchsuring [22:30:50] .org [22:30:58] postgres? [22:31:01] naw mysql [22:32:00] it'll take a bit before I see the insert(s?) on m4-master? [22:32:04] while its dumping? [22:32:05] ja? [22:32:53] it is already doing those [22:33:13] "Inserted about 1000 rows | INSERT IGNORE INTO `MobileWebSectionUsage_14321266` VALUES" [22:34:10] hmm [22:34:13] how fast is is going? [22:34:15] i don't see those, but maybe i missed it? [22:34:25] ~2MB / sec [22:34:25] db1046? [22:34:45] i guess i connected to m4-master.eqiad.wmnet [22:34:49] so who knows? [22:34:50] :D [22:35:04] with mysql or ssh? [22:35:13] mysql [22:35:39] then you shoud see those, just not allways [22:35:50] oh ja? [22:35:57] how does that work, btw? [22:36:02] it inserts in groups of XK roows [22:36:02] haproxy with two backends for a master? [22:36:16] so, full story is [22:36:45] real servers: dbproxy1004, db1046, db1047, dbstore1002 [22:36:57] first one is only haproxy, others mysql [22:37:04] * ottomata prepares for enlightenment [22:37:13] m4-master is a dns that points to dbproxy [22:37:32] haproxy redirects (normally) to db1046 [22:37:46] if db1046 fails, it failovers automatically to db1047 [22:37:47] ah check backup is just fallback? [22:37:49] ah [22:38:10] and that is why the sync script does the funky replication to both db1047 and dbstore1002 [22:38:24] that actually has nothing to do with the sync [22:38:29] no? [22:38:40] it is synced to db1047 via that script though, ja? [22:38:50] if replicatino worked, we would have regular replication [22:38:55] with the same setup [22:38:59] aye [22:39:18] and we'd have to do manual work to make db1046 a slave of db1047 when 1046 came back? [22:39:49] well, we already have to do manual work [22:39:55] just that now it is possible [22:39:58] aye [22:40:07] cool makes sense [22:40:12] before it would not even be possible with our load [22:40:41] but it is not a resync only, replication under normal circunstances was getting behind more and more [22:40:57] bulk inserts are faster [22:41:00] Analytics-Kanban: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1971777 (Ottomata) I learned about the miracle today. :) I'm syncing MobileWebSectionUsage_14321266 from the backup master on db1047 now. It is slowly inserting into analyt... [22:41:08] aye [22:41:13] there is a ticket to improve what we have now [22:41:45] https://phabricator.wikimedia.org/T124307 [22:41:51] it is still not very good [22:42:22] and I want all kinds of scripts done for what you just did [22:42:37] sync source destination and forget [22:48:34] aye [22:48:41] ok, it is trucking along fine [22:48:46] gonna step away from compy mostly for a while [22:48:52] many thanks jynus for your help [22:49:02] (HaeB: say thank you!) [23:38:06] (PS3) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [23:38:40] (CR) jenkins-bot: [V: -1] [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [23:41:30] (CR) Yuvipanda: [WIP] Database selection (3 comments) [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [23:41:36] (PS4) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [23:42:03] (CR) jenkins-bot: [V: -1] [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [23:43:35] (PS5) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [23:44:03] (CR) jenkins-bot: [V: -1] [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [23:45:39] (PS6) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [23:46:23] (CR) jenkins-bot: [V: -1] [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [23:46:35] madhuvishy: https://github.com/wikimedia/analytics-wikimetrics/blob/master/tests/__init__.py#L60 [23:47:05] nuria: hmmm [23:47:25] the problem is that the test dbs i create - but the tests dont seem to create the tables [23:47:29] i dont know why [23:49:04] madhuvishy: that points to sqlalchemy plus mysql interactions [23:49:12] nuria: yeah [23:49:38] make sure when running tests your test config is right: https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/config/test_config.yaml [23:50:55] nuria: yeah i did [23:51:02] this is the config it seems to be picking up [23:51:03] {'REVISION_TABLENAME': 'revision_userindex', 'ARCHIVE_TABLENAME': 'archive_userindex', 'WIKIMETRICS': {'HOST': 'db', 'PASSWORD': 'wikimetrics', 'USER': 'wikimetrics', 'DBNAME': 'wikimetrics'}, 'REPLICATION_LAG_THRESHOLD': 3, 'PROJECT_HOST_NAMES': ['wiki', 'dewiki', 'enwiki'], 'MEDIAWIKI_POOL_SIZE': 32, 'SQL_ECHO': False, 'WIKIMETRICS_POOL_SIZE': 20, [23:51:04] 'WIKIMETRICS_ENGINE_URL': 'mysql://wikimetrics:wikimetrics@db/wikimetrics_testing', 'DEBUG': True, 'CENTRALAUTH_ENGINE_URL': 'mysql://wikimetrics:wikimetrics@db/centralauth_testing', 'REPLICATION_LAG_MW_PROJECTS': [], 'MEDIAWIKI_ENGINE_URL_TEMPLATE': 'mysql://wikimetrics:wikimetrics@db/{0}_testing'} [23:51:07] gah [23:51:13] tables get created here: [23:51:21] https://www.irccloud.com/pastebin/am8DoJyu/ [23:51:42] madhuvishy: so make sure you are running in DEBUG mode [23:51:46] it is [23:51:56] (PS7) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) [23:52:17] madhuvishy: then next step is have sql alchemy tell you what is going on, you can turn debugging up so it will spiut out everything [23:52:24] (CR) jenkins-bot: [V: -1] [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466) (owner: Alex Monk) [23:52:27] nuria: from where? [23:53:04] madhuvishy: echo+true [23:53:09] echo=True [23:53:45] on echo=False¶ – if True, the Engine will log all statements as well as a repr() of their parameter lists to the engines logger, which defaults to sys.stdout. The echo attribute of Engine can be modified at any time to turn logging on and off. If set to the string "debug", result rows will be printed to the standard output as well. This flag ultimately [23:53:45] controls a Python logger; see Configuring Logging for information on how to configure logging directly. [23:55:35] nuria: thanks i'll check it out [23:57:29] (PS8) Alex Monk: [WIP] Database selection [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266925 (https://phabricator.wikimedia.org/T76466)