[00:00:38] SMalyshev: git clone https://gerrit.wikimedia.org/r/analytics/refinery/source [00:00:48] SMalyshev: and later git pull https://gerrit.wikimedia.org/r/analytics/refinery/source refs/changes/42/364542/4 [00:00:55] nuria_: ok it's in /home/smalyshev/refinery [00:01:24] so what I should do with it? [00:01:32] SMalyshev: let's build here [00:01:47] SMalyshev: and see if we get an error , ahem... here being 1002, sorry [00:02:11] ok, it seems to download the whole world, once it's done I'll tell [00:02:47] ok, done [00:04:04] doesn't seem to change anything though [00:04:16] still can't find the class [00:04:59] SMalyshev: wait, wait, i think we were both building on your dir at teh same time [00:05:01] *the [00:06:13] 10Analytics, 10Project-Admins: Create tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3428582 (10Aklapper) 05Open>03Resolved Alright, thanks for clarifying! Closing task as resolved. [00:09:22] SMalyshev: testing your jar on my end [00:14:25] SMalyshev: are you using refinery-hive-0.0.49-SNAPSHOT.jar [00:14:43] SMalyshev: it is build with all deps [00:14:58] which one? the one at your dir? [00:15:03] no, yours [00:15:20] SMalyshev: nuria@stat1002:/home/smalyshev/refinery/refinery-hive/target$ jar -tf refinery-hive-0.0.49-SNAPSHOT.jar | grep eflec [00:16:56] yeah it has reflections but for some reason that didn't work... [00:17:14] SMalyshev: i get a different error using your jar but nothing to do with reflections [00:17:39] SMalyshev: wait a sec [00:17:54] nuria_: if I use jar from your dir, there's no exception but no tags for wdqs [00:18:07] if I add reflection-core jar from my dir, there's exception [00:18:53] SMalyshev: testing like: [00:19:08] https://www.irccloud.com/pastebin/ybmkpqF9/ [00:19:53] nuria_: that one doesn't have refinery-core jar? [00:20:18] SMalyshev: no, refinery-hive is a downstream from core [00:20:24] ahh, ok [00:20:33] SMalyshev: is that it? [00:21:00] let me see [00:21:08] I'll rebuild hive one [00:21:21] SMalyshev: your jar works fine [00:21:39] Failed to execute goal on project refinery-hive: Could not resolve dependencies for project org.wikimedia.analytics.refinery.hive:refinery-hive:jar:0.0.49-SNAPSHOT: Could not find artifact org.wikimedia.analytics.refinery.core:refinery-core:jar:0.0.49-SNAPSHOT in system-wide-wmf-mirrored-default (https://archiva.wikimedia.org/repository/mirrored/) -> [Help 1] [00:22:49] hmm no refinery-hive doesn't want to build on stat1002 :( [00:22:57] missing tons of deps [00:23:04] let me see if I can build locally [00:24:16] Failed to execute goal on project refinery-hive: Could not resolve dependencies for project org.wikimedia.analytics.refinery.hive:refinery-hive:jar:0.0.49-SNAPSHOT: Failed to collect dependencies for [junit:junit:jar:4.11 (test), pl.pragmatists:JUnitParams:jar:1.0.3 (test), org.wikimedia.analytics.refinery.core:refinery-core:jar:0.0.49-SNAPSHOT (compile), org.apache.lucene:lucene-analyzers-common:jar:5.5.3 (compile), org.apache. [00:24:17] hadoop:hadoop-common:jar:2.6.0-cdh5.10.0 (provided), org.apache.hadoop:hadoop-client:jar:2.6.0-cdh5.10.0 (provided), org.apache.hive:hive-exec:jar:1.1.0-cdh5.10.0 (provided), com.googlecode.json-simple:json-simple:jar:1.1.1 (compile)]: Failed to read artifact descriptor for org.wikimedia.analytics.refinery.core:refinery-core:jar:0.0.49-SNAPSHOT: Could not find artifact org.wikimedia.analytics.refinery:refinery:pom:0.0.49-SNAPSHO [00:24:17] T in system-wide-wmf-mirrored-default (https://archiva.wikimedia.org/repository/mirrored/) [00:24:20] SMalyshev: it builds [00:24:22] that's what happens on stat1002 [00:24:24] SMalyshev: from top [00:24:39] ah, ok, I'll try that [00:24:52] SMalyshev: so, doing >mvn package on top will build all jars [00:24:56] SMalyshev: ok [00:25:07] SMalyshev: your jar works fine dep wise, [00:28:44] ok, it built... running it now [00:29:46] nuria_: ah, excellent seems to be working now! [00:30:16] SMalyshev: i also recomend building on stat1002 because you will run with teh java version you will use in prod and that matters [00:30:28] nuria_: ok, thanks, will do! [00:30:31] SMalyshev: k [00:30:48] now I guess I'll add some tests and the patch will be ready for action [00:30:59] nuria_: thanks for your help! [00:31:11] SMalyshev: np [00:58:46] 10Analytics: Add "desktop by browser" tab to browser reports - https://phabricator.wikimedia.org/T170286#3428723 (10Nuria) [00:58:56] 10Analytics-Kanban: Add "desktop by browser" tab to browser reports - https://phabricator.wikimedia.org/T170286#3425599 (10Nuria) [00:59:29] (03PS1) 10Nuria: Add "desktop by browser" tab to browser reports [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/364631 (https://phabricator.wikimedia.org/T170286) [01:02:31] (03PS2) 10Nuria: Add "desktop by browser" tab to browser reports [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/364631 (https://phabricator.wikimedia.org/T170286) [04:22:51] 10Quarry, 10Cloud-Services: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3428933 (10Tbayer) Don't know whether it should be considered dealbreaker, but FWIW: Apart from permalinks ensuring reproducibility of individual anayses, the history is also an accide... [05:20:07] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3429003 (10Marostegui) I can confirm that only x1 broke [05:57:46] 10Analytics, 10Analytics-EventLogging, 10DBA: dbstore1002 crashed - https://phabricator.wikimedia.org/T170308#3429036 (10Marostegui) 05Open>03Resolved a:03Marostegui I have fixed x1 and replication has caught up again ``` Seconds_Behind_Master: 0 ``` [07:16:28] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3429131 (10phuedx) >>! In T170018#3425659, @Nuria wrote: > If the duplicate issue affects the majority of your da... [07:31:32] 10Analytics, 10EventBus, 10Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3429151 (10faidon) [08:45:50] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3429243 (10phuedx) Soon after yesterday's European Mid-day SWAT deployment (11th July 2017, 1-2 PM UTC), the numb... [09:44:58] elukey: around? [09:47:09] addshore: yep! o/ [09:48:44] pm :D [09:49:16] 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3429464 (10elukey) a:03elukey [09:54:40] (03PS1) 10Addshore: Use INI_SCANNER_RAW in parse_ini_file [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364685 (https://phabricator.wikimedia.org/T170282) [09:55:33] (03PS1) 10Addshore: Use INI_SCANNER_RAW in parse_ini_file [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364686 (https://phabricator.wikimedia.org/T170282) [09:55:37] (03CR) 10Addshore: [C: 032] Use INI_SCANNER_RAW in parse_ini_file [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364686 (https://phabricator.wikimedia.org/T170282) (owner: 10Addshore) [09:55:41] (03CR) 10Addshore: [C: 032] Use INI_SCANNER_RAW in parse_ini_file [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364685 (https://phabricator.wikimedia.org/T170282) (owner: 10Addshore) [09:55:46] (03Merged) 10jenkins-bot: Use INI_SCANNER_RAW in parse_ini_file [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364686 (https://phabricator.wikimedia.org/T170282) (owner: 10Addshore) [09:55:50] (03Merged) 10jenkins-bot: Use INI_SCANNER_RAW in parse_ini_file [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364685 (https://phabricator.wikimedia.org/T170282) (owner: 10Addshore) [09:57:11] moritzm available for a quick PM regarding the above patch? [10:02:13] addshore: sure [10:59:23] 10Analytics, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 2 others: Define metrics for search result quality for the entity selector widget on wikidata. - https://phabricator.wikimedia.org/T170400#3429845 (10daniel) [11:03:13] 10Analytics, 10CirrusSearch, 10Discovery, 10Discovery-Search, and 3 others: Define metrics for search result quality for the entity selector widget on wikidata. - https://phabricator.wikimedia.org/T170400#3429902 (10daniel) [11:20:07] 10Analytics-Tech-community-metrics, 10Regression: Git repo blacklist config not applied on wikimedia.biterg.io - https://phabricator.wikimedia.org/T146135#3429976 (10Aklapper) p:05Normal>03Lowest a:05Lcanasdiaz>03None I'm unassigning this task. As per my previous comment I'm not convinced this is the b... [12:02:36] * elukey lunch! [13:11:50] fixing pageview-hourly-wf-2017-7-12-11 :) [13:14:54] fdans: you around? [13:15:08] yup! [13:15:14] what's up elukey ? [13:15:34] interested in fixing the alert for pageview-hourly-wf-2017-7-12-11 ? [13:15:37] otherwise I'll o it [13:15:40] *do it [13:16:02] fdans: --^ [13:16:41] elukey let's do it together!!! batcave? [13:17:21] fdans: do you mind if we do it via IRC? Still in the middle of some things to check :( [13:17:38] so the starting point is https://wikitech.wikimedia.org/wiki/Analytics/Team/Oncall#Find_and_fix_pageview_whitelist_exceptions [13:17:42] not sure if you have read it [13:18:44] I haven't - reading [13:22:02] opening hive now :) [13:23:41] also https://tools.wmflabs.org/sal/production is good to double check what you'll find :) [13:27:25] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#2857791 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by otto on neodymium.eqiad.wmnet for hosts: ``` ['stat1006.eqiad.wmnet'] ``` The log can be found i... [13:28:30] fdans: if you have doubts feel free to ask, I didn't mean to throw a task at you without helping :) [13:28:59] elukey http://s2.quickmeme.com/img/54/542c38f804ca62fd98385151e1b647fb8fe02a81acf76a6a07712df4563d6359.jpg [13:29:15] aahahha ack [13:29:56] (03PS1) 10Fdans: Adds din.wikipedia to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364720 [13:30:03] elukey ^ [13:30:43] (sorry, used present simple in the commit message, I know we don't like that) [13:31:07] fdans: "add" is better [13:31:44] (03CR) 10Elukey: Adds din.wikipedia to whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364720 (owner: 10Fdans) [13:31:55] extra tab :) [13:32:16] I disagree, but I respect the in-house conventions, I used it accidentally sorry :) [13:32:24] (re: verb form) [13:32:52] I don't have a strict opinion man, just telling you what ops etc.. use :) [13:34:51] (03PS2) 10Fdans: Add din.wikipedia to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364720 [13:35:19] (03CR) 10Elukey: [C: 031] Add din.wikipedia to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364720 (owner: 10Fdans) [13:38:51] fdans: extra bonus points - have you ever deployed? [13:39:21] not this I don't think elukey [13:40:04] is that the second part of the instructions? [13:41:21] fdans: I meant deploy the refinery, but we can skip that today.. it is fine to just merge the code review and fix the file in hdfs [13:41:34] (03CR) 10Elukey: [V: 032 C: 032] Add din.wikipedia to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/364720 (owner: 10Fdans) [13:42:17] ok elukey http://static.fjcdn.com/pictures/I+got+this+the+tags+are+insane_d1b03b_3724044.jpg [13:45:05] elukey done! [13:46:48] fdans: !log it so anybody can check in https://tools.wmflabs.org/sal/analytics [13:48:08] !log updated pageview whitelist with din.wikipedia [13:48:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:48:51] gooooood [13:51:39] urandom: hello! I am wondering if we could chat (whenever you have time) about JBODvsRAID in cassandra [13:52:05] we have a similar decision to take for kafka so it might be good to know your experience [13:52:41] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3430530 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['stat1006.eqiad.wmnet'] ``` Of which those **FAILED**: ``` set(['stat1006.eqiad.wmnet']) ``` [13:56:20] http://docs.confluent.io/current/kafka/deployment.html [13:56:28] Our recommendation is to use RAID 10 if the additional cost is acceptable. Otherwise, configure your Kafka server with multiple log directories, each directory mounted on a separate drive. [14:23:09] elukey: sure, we also have the standup thingy in a bit, don't know if you were going to make that, but i don't have much [14:24:04] urandom: ah nice, we can definitely chat in there too ! [14:24:30] * urandom is reading the kafka link [14:31:15] a-team: why are we backing up eventlogging data log files from stat1002 -> bacula backup sets? [14:31:17] anyone know? [14:31:27] seems redundant and maybe possible violates the privacy stuff we are doing [14:31:40] i'd like to stop doing that with the stat box migration [14:34:13] ottomata: mmmm I am not aware, have we a bacula config in puppet for it? [14:35:21] w do [14:35:22] we do [14:42:02] weird [14:42:22] what logs are we backupping though? [14:42:41] (don't know what data log file means for EL) [14:42:55] does it hold all the reqs handled? [14:43:21] Hey folks. Google reminded me that I had some photos from July 12th, 2015 -- the Analytics/Research offsite in Mexico City. See https://goo.gl/photos/QVw1bkjSjcWXCFmR8 [14:43:56] I've got a photo where ottomata looks like he's about to be taken out by a bridge while standing on the top of a bus [14:47:11] :) [14:47:19] elukey: those are things like client side and evnetlogging-valid-mixed [14:47:25] basically the same stuff that's in kaafka [14:47:51] ahh, its not violating privacy, they are deleted after 90 days [14:47:53] but still [14:48:01] they are saved to hdfs/mysql, 7 days in kafka [14:48:04] evnetlog1001 [14:48:09] and also rsynced to stat1002/3 [14:48:13] AND we bacula backup them [14:48:23] oh, i dunno how long bacula is keeping them [14:48:25] halfak: fearless ottomata is fearless :D [14:48:26] that could violate something [14:48:54] :D [14:49:34] ottomata: I think they may be useful in case the mysql consumer breaks badly (or the master) and we need to replay from logs? Not really sure, but I agree that it seems redundant [14:53:22] from bacula though? [14:53:28] we have 90 days of these logs on disk in 3 differenet places [14:55:33] 10Quarry, 10Cloud-Services: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3430771 (10Halfak) > history is also an accidental but very valuable source of knowledge To clarify, the decision to include a query history was not accidental at all. That was very... [14:56:23] ottomata: ahh we also have them in hdfs? [14:56:36] or you mean your recent work ? [14:56:53] in any case, the bacula config seems belonging to a distant past :) [14:57:12] if we have logs in hdfs/mysql let's nuke it [14:57:18] elukey: they've been going into hdfs for a while [14:57:19] but not refined [14:57:27] aye [15:00:35] ping fdans ottomata [15:10:37] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Investigate detached duplicated accounts in DB with same username, same source, but different uuids - https://phabricator.wikimedia.org/T170093#3430870 (10Aklapper) [15:10:39] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Check detached accounts in DB with same username for "mediawiki" and "phab" sources but different uuid's (and merge if connected) - https://phabricator.wikimedia.org/T170091#3430871 (10Aklapper) [15:30:32] 10Analytics, 10Analytics-Wikistats: Deploy Wikistats and analytics.wikimedia.org via SCAP - https://phabricator.wikimedia.org/T170429#3430944 (10Milimetric) [15:30:39] 10Analytics, 10Analytics-Wikistats: Deploy Wikistats and analytics.wikimedia.org via SCAP - https://phabricator.wikimedia.org/T170429#3430955 (10Milimetric) p:05Triage>03Low [16:04:45] 10Analytics-Kanban: Add time_to_user_next_edit and time_to_page_next_edit in Mediawiki Denormalized History - https://phabricator.wikimedia.org/T161896#3431118 (10Milimetric) [16:13:16] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3431152 (10Fjalapeno) @mobrovac cool… sounds good to me 👍 [16:38:35] ok elukey you can deploy anytime you have free [16:38:38] let us know [16:44:37] (in a meeting now but I'll do after it) [17:01:43] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3431419 (10MoritzMuehlenhoff) >>! In T152712#3424883, @MoritzMuehlenhoff wrote: > At this point there's no cdh release for stretch yet and the hadoop-mapreduce package... [17:02:10] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3431423 (10Ottomata) :D Thanks! [17:03:03] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3431424 (10Ottomata) What's the name of the package? [17:03:15] elukey: you're wanted at the caveois! :D [17:04:35] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Reading-Web-Backlog, and 3 others: Duplicate events sent in Firefox after back button press - https://phabricator.wikimedia.org/T170018#3431429 (10phuedx) a:05Jdlrobson>03phuedx I can be responsible for signing this off. [17:27:17] 10Analytics, 10EventBus, 10Security-Reviews, 10Security-Team, 10Services: Security review of EventBus extension - https://phabricator.wikimedia.org/T120212#3431597 (10GWicke) [17:27:20] 10Analytics, 10EventBus, 10RESTBase, 10Services, 10RESTBase-release-1.0: RESTBase should honor wiki-wide deletion/suppression of users - https://phabricator.wikimedia.org/T120409#3431594 (10GWicke) 05Open>03Resolved a:03GWicke Since the last Parsoid HTML version increment, old HTML is now gone from... [17:31:47] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3431628 (10MoritzMuehlenhoff) >>! In T152712#3431424, @Ottomata wrote: > What's the name of the package? libssl1.0.0 [17:45:40] going offline people! [17:45:43] * elukey afk [18:12:52] 10Analytics: Code Review Needed: WMDE Summer Banner Campaign Analytics - stat1002 - https://phabricator.wikimedia.org/T170452#3431877 (10GoranSMilovanovic) [18:17:15] I'm writing up some documentation for how to run the mjolnir (the machine learning ranker) data collection and training in the analytics network. In my section about resource usage, is it fair to say to set the resource limits at < half the cluster resources for short (sub 30 minute) tasks, and less than 1/3 for longer (multi-hour) tasks, or should i be more conservative? [18:17:23] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3431919 (10Ottomata) Ok, I've just got `statistics::packages` to work on stat1006. Documenting what I've done here: - openjdk-7 is no longer available, using openjdk... [18:19:32] generally i've been aiming at ~400 cores (we use ~768M memory per core, so not as relevant) for 20 to 30 minute tasks which seems to work alright. I haven't run too many of the longer tasks but final production model training will use "all the data" which is about 30-40x more work than the 20 minute task and is in the 5 to 8 hours of computation regime [18:52:16] ebernhardson: the consumption of resources will be throttled by the queue in which the job will run too [18:53:36] ebernhardson: and that should be our responsibility (that no job can bring pageview refining to a halt, for example) so yours (as upper bound estimates) seem ok cc ottomata for confirmation [18:54:39] +1 :) [18:55:35] sounds good, thanks1 [18:55:48] 10Analytics-Kanban: Define, Document (and test) Desktop and Mobile browser support for wikistats 2.0 - https://phabricator.wikimedia.org/T170457#3432030 (10Nuria) [18:56:17] 10Analytics-Kanban: Set up continuos integration for wikistats 2.0 UI - https://phabricator.wikimedia.org/T170458#3432047 (10Nuria) [18:56:49] 10Analytics-Kanban: Cleanup Routing code - https://phabricator.wikimedia.org/T170459#3432061 (10Nuria) [18:58:32] 10Analytics-Kanban: Wikistats 2.0 UI second deployment/iteration - https://phabricator.wikimedia.org/T170460#3432076 (10Nuria) [18:59:11] 10Analytics-Kanban: Addition of Unique Devices metric - https://phabricator.wikimedia.org/T170461#3432090 (10Nuria) [19:00:15] 10Analytics-Kanban: Addition of (mock) Active Editors metric - https://phabricator.wikimedia.org/T170463#3432119 (10Nuria) [19:03:05] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432141 (10awight) [19:03:45] 10Analytics-Kanban, 10Analytics-Wikistats: Addition of (mock) Active Editors metric - https://phabricator.wikimedia.org/T170463#3432155 (10Nuria) [19:04:00] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0 UI second deployment/iteration - https://phabricator.wikimedia.org/T170460#3432159 (10Nuria) [19:04:30] 10Analytics-Kanban, 10Analytics-Wikistats: Define, Document (and test) Desktop and Mobile browser support for wikistats 2.0 - https://phabricator.wikimedia.org/T170457#3432161 (10Nuria) [19:04:59] 10Analytics-Kanban, 10Analytics-Wikistats: Set up continuos integration for wikistats 2.0 UI - https://phabricator.wikimedia.org/T170458#3432163 (10Nuria) [19:05:12] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0. - https://phabricator.wikimedia.org/T130256#3432164 (10Nuria) [19:06:06] 10Analytics-Kanban, 10Analytics-Wikistats: Implement pageview metric in Wikistats UI - https://phabricator.wikimedia.org/T163817#3432167 (10Nuria) [19:06:43] milimetric: can we close this one?: https://phabricator.wikimedia.org/T167674 [19:16:50] 10Analytics-Kanban, 10Patch-For-Review: Provide cumulative edit count in Data Lake edit data - https://phabricator.wikimedia.org/T161147#3432189 (10Nuria) [19:16:52] 10Analytics-Kanban, 10Patch-For-Review: Troubleshoot issues with sqoop of data not working for big tables - https://phabricator.wikimedia.org/T169782#3432188 (10Nuria) 05Open>03Resolved [19:18:18] 10Analytics-Kanban, 10Patch-For-Review: Provide cumulative edit count in Data Lake edit data - https://phabricator.wikimedia.org/T161147#3122928 (10Nuria) @Neil_P._Quinn_WMF We have added the cumulative edit count, would you be so kind to do some vetting of data (we have done some ourselves but additional veri... [19:18:44] 10Analytics-Data-Quality, 10Analytics-Kanban: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#3432205 (10Nuria) 05Open>03Resolved [19:18:46] 10Analytics, 10Research-and-Data: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3432206 (10Nuria) [19:19:48] milimetric: will leave edit count task open until neilpquinn can verify [19:20:16] 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Elukey: Create a user for the eventlogging_cleaner script on the analytics slaves - https://phabricator.wikimedia.org/T170118#3432244 (10Nuria) 05Open>03Resolved [19:20:24] cool nuria_, works for me [19:20:33] ping neilpquinn yt? [19:21:50] 10Analytics-Kanban: Use native timestamp types in Data Lake edit data - https://phabricator.wikimedia.org/T161150#3432260 (10Nuria) 05Open>03Resolved [19:22:16] 10Analytics-Kanban, 10Analytics-Wikistats: Cleanup Routing code - https://phabricator.wikimedia.org/T170459#3432263 (10Nuria) [19:26:44] 10Analytics-Kanban, 10Analytics-Wikistats: Addition of Unique Devices metric - https://phabricator.wikimedia.org/T170461#3432294 (10Nuria) [19:27:22] milimetric, fdans, mforns filed next tasks for wikistats 2.0 , all with 5 points per default, they are in the kanban, we should tackle the ones that "hang" from initial deployment first: https://phabricator.wikimedia.org/T160370 [19:43:36] 10Analytics, 10Beta-Cluster-Infrastructure, 10scap2, 10Patch-For-Review, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#3432367 (10GWicke) Is there still anything actionable left on this task, or is it time to declare victory? [19:46:08] 10Analytics, 10RESTBase, 10Services (later), 10User-mobrovac: Expose pageview data in each project's REST API - https://phabricator.wikimedia.org/T119094#3432374 (10GWicke) [19:46:17] milimetric: yt? [19:49:21] 10Analytics, 10Beta-Cluster-Infrastructure, 10scap2, 10Patch-For-Review, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#3432384 (10Nuria) 05Open>03Resolved [19:49:54] 10Analytics, 10Beta-Cluster-Infrastructure, 10scap2, 10Patch-For-Review, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (10Nuria) Victory it is. [19:50:36] nuria_: i'm trying to figure out what the aggregator projectview stuff is used for [19:50:44] ottomata: ajam [19:50:50] it gets pushed to a repo [19:50:55] is this what reportcard uses? [19:50:57] or dashiki? [19:51:15] i used to know this, can't remember [19:51:32] ottomata: dashiki uses for pageviews and unqiue devices aqs [19:51:55] uses aggregator repo? [19:51:59] oh [19:52:03] uses aqs sorry [19:52:05] ottomata: report updater pulls data from elsewhere (hive, mysql) and generates files, but not from a depot [19:52:19] https://github.com/wikimedia/analytics-aggregator [19:52:19] ottomata: maybe that was used by wikimetrics [19:52:29] it is still generating aggregated projectview count files [19:52:34] and committing them to a data repo [19:52:41] https://github.com/wikimedia/analytics-aggregator-data [19:53:43] ottomata:i think that is what dashiki might have used earlier on , yes [19:53:51] so, is anything using this data anymore? [19:53:53] ottomata: before we changed it to the pageview api [19:54:08] stat1002:/a/aggregator/projectview/data [19:54:31] ottomata: that might be rsync and thus have outside users [19:55:14] ottomata: is that directory rsync-ed? [19:55:31] not that i can find [19:55:38] but, it does push to that repo [19:55:42] so someone might clone it [19:55:45] mabye it used to be cloned in labs [19:55:46] for dashiki [19:56:09] ottomata: ya i bet it was [19:56:24] i'm considering not migrating this job or data to stat1005 [19:56:33] ottomata: but dashiki 9our instance) has not been in labs in forever [19:56:49] ottomata: agreed [19:56:51] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3432433 (10Nirmos) Tags aren't available either? [19:57:15] ottomata: to be clear there are dashiki instances on labs but none that use this data [19:59:01] ottomata: on meeting on a bit , +1 on my end [19:59:23] bearloga: yt? [19:59:54] ottomata: about to be in a meeting but yes [20:00:31] i'm in the process of moving automated jobs from stat1002/3 -> stat1005/6 [20:00:40] i think i might need your help for the statistics::discovery ones [20:00:45] stat1005 is Debian Stretch [20:00:53] a big uprade from stat1002 ubuntu trusty [20:01:00] relevant here: php7 vs php5 [20:05:54] 10Analytics, 10Analytics-Cluster: Move statistics::discovery jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170471#3432494 (10Ottomata) [20:07:21] 10Analytics, 10Analytics-Cluster: Move statistics::wmde jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170472#3432519 (10Ottomata) [20:07:35] 10Analytics, 10Analytics-Cluster: Move statistics::discovery jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170471#3432536 (10Ottomata) [20:07:41] addshore: you too ^ :) [20:07:49] ooooh [20:08:00] when is stat1002 being killed? :P [20:08:13] there isn't a date, but imo asap :) [20:08:17] okay! [20:08:17] this quarter at least [20:08:23] we haven't sent annoucement yet [20:08:27] wanted to get stat1005 and stat1006 up [20:08:36] and the autmoated puppet stuff migrated first [20:08:43] then I can go into cat hearding mode [20:08:49] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-Addshore: Move statistics::wmde jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170472#3432539 (10Addshore) [20:08:53] herding* [20:09:06] oooh, php7 instead of php5 [20:09:14] yup :) [20:09:31] addshore: how about this: i add a conditional that checks out the software on stat1005, but doesn't install or run the crons [20:09:41] then you can try and run the stuff manually and see if it works, and if not, we can resolve issues? [20:09:53] hahaa, can we move this now? [20:10:17] stat1005 already exists? :) [20:10:40] ya [20:10:46] its up and running now [20:10:48] you should be able to log in [20:10:59] Then lets just flip the switch #moveFastAndBreakStuff [20:11:09] haha, awesome, ok. so, i'll do this then [20:11:17] :D [20:11:26] i'll apply this on stat1005, remove the crons from stat1002, rsync over data from stat1002 [20:11:28] then levae it to you? [20:11:29] :) [20:11:31] I doubt anything will break, but I'll check this evening [20:11:34] yup! [20:11:38] ok awesome thanks [20:12:12] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-Addshore: Move statistics::wmde jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170472#3432577 (10Addshore) Ooooh, PHP7 instead of PHP5. Please just go ahead and flip the switch. I'm around to fix anything that break... [20:13:27] and I can indeed ssh there :) [20:14:11] ooh wait, ottomata 1 thing! [20:14:35] let me check, I think the script that needed something in the network opening up no longer runs there... but ill check [20:15:25] ok, addshore are all data files in data and log? [20:15:28] all output files? [20:15:31] i can just rsync those, right? [20:16:07] ottomata: Yup [20:16:14] great [20:16:24] 10Analytics, 10Analytics-Cluster, 10Security: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3432621 (10GoranSMilovanovic) Again on stat1002: sqoop import --connect jdbc:mysql://analytics-store.eqiad.wmnet/enwiki --password the src directors has the 2 git repos and the config provided by puppet [20:16:42] and I just checked and the script that needs the firewall opening no longer runs there :) so nothing to worry about [20:17:00] cool [20:17:22] Also, I got a shiney new window in my van 2 days ago ;) [20:17:39] haha nice [20:18:47] allllright! your crons are created, and your data is rsyncing from stat1002. i've removed the crons from running on stat1002, but the code and old data is still there. new data should come in on stat1005 now. [20:19:11] addshore: if you verify it all works, etc. feel free to close that task. i'm going to assign it to you now [20:19:20] ack! [20:19:26] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-Addshore: Move statistics::wmde jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170472#3432649 (10Ottomata) a:05Ottomata>03Addshore [20:19:45] hopefully bearloga's discovery ones will be this easy too :) [20:22:31] oooh, woo, they are in /srv now ;) [20:24:26] ottomata: can you please email me and cc deb tankersley and chelsyx? the php-based discovery-stats stuff might actually need to be retired rather than ported. [20:25:32] bearloga: i made a task [20:25:35] CCEd you [20:25:45] https://phabricator.wikimedia.org/T170471 [20:25:54] and that would be great! [20:25:56] if we can retire it [20:26:13] bearloga: i am probably going to need a lot of R related vetting from you of stat1005/6 [20:26:16] new versions! :) [20:26:35] Hello analytics people [20:26:40] the r-cran-rmysql package was a little funky (not avaiable in stretch), so I did some weird stuff to get it [20:26:42] we'll see if it works [20:26:44] hiii RoanKattouw! [20:26:52] Could someone help me figure out why many of my EventLogging events are not making it into MySQL? [20:27:04] RoanKattouw: i can try [20:27:10] what schema? [20:27:12] I know there are logs for validation errors, but the logstash dashboard linked from the docs doesn't exist any more, and I don't have stat1002 access, only 1003 [20:27:16] ChangesListHighlights [20:27:44] what do you need on stat1002 taht isn't on stat1003? [20:28:08] ottomata: looks like something might be up with the file ownership? [20:28:13] idk what's on what exactly but I think I saw some docs listing troubleshooting steps that require 1002 access [20:28:15] ohhhh probably [20:28:16] uyeah fixin [20:28:21] stat1002 -rw-rw-r-- 1 analytics-wmde analytics-wmde 2827020 Jul 12 20:15 minutely.log [20:28:28] ottomata: would it be possible for us R people to, say, provide a wish-list of R packages that we would like to see available on production and the new machines? Just asking... [20:28:30] stat1005 -rw-rw-r-- 1 993 1005 2827020 Jul 12 20:15 minutely.log [20:28:52] addshore: check now [20:28:53] In any case -- I caused some events for that schema yesterday afternoon and again today, but select max(timestamp) from ChangesListHighlights_16484288; remains stuck at 20170711221602 [20:29:03] haha, GoranSM you can try! make a phab task [20:29:10] ottomata: user looks good, group still loks odd [20:29:11] *looks [20:29:12] if they are magically in debian stretch, mabye you will be lucky! [20:29:16] ottomata: O:-) [20:29:19] I just submitted three more events for testing [20:29:23] ay ya, sorry addshore [20:29:23] now [20:29:31] looks good! ty! [20:29:33] ottomata: i'll be happy to test any r stuff on stat1005/6 :) [20:29:42] bearloga: Would you join me in listing the R packages that our great friend ottomata would let us have on the new machines? [20:31:08] ottomata: same java version? [20:32:20] nope java 8 [20:32:24] openjdk 8 [20:32:38] RoanKattouw: i'm looking [20:32:44] GoranSM: hiya! I don't really see a need for ottomata to install specific R packages into the machine's shared library since we can just install packages into a library in our homedirs. RMySQL is a special exception [20:32:49] oooh, okay [20:33:22] GoranSM: which R packages do you have in mind? [20:34:38] bearloga: ооoohh... that's great. I have *many* packages on my mind, just to least a few that I use too often but haven't seen installed on stat1003, for example: {smacof} for multidimensional scaling, then wrangling things like {dplyr} and {tidyr} - in spite of the fact that I will tend to migrate all code to {datatable} soon... [20:35:22] bearloga: How was your luck with {sparkR} and {sparklyr} on production? I remember you've mentioned some experiments in an e-mail, maybe a few weeks ago..? [20:36:26] RoanKattouw: so far, i see more recent events in every place they should be [20:37:04] the last insert into mysql [20:37:06] as you say [20:37:07] ottomata: Including MySQL / stat1003? [20:37:08] was 2017-07-11 22:16:05,077 [20:37:11] hah [20:37:11] not mysql [20:37:17] So everywhere else, the data is there? [20:37:20] so, i would say they are waiting to be batched [20:37:21] but [20:37:24] that is too long [20:37:52] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432726 (10awight) p:05Normal>03Low [20:38:00] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432141 (10awight) [20:38:05] its 3000 events or 5 minutes, whichever comes first [20:38:09] def been more than 5 mins [20:38:15] GoranSM: yeah, it's better to install those into a personal lib in your homedir and that way you can update them whenever as opposed to getting it install globally (will not be updated, like ever). the reason rmysql is installed site-wide is because it's installed as debian package and has a dependency on mysql library. but everything else like dplyr you can just install yourself [20:38:32] What places are everywhere else and how do I get to them? Also this schema has arrays as values for one of the fields, which are not so easy to query in MySQL, are they handled better by these other things/places? [20:38:55] Also something is working then because there are also <3000 events in that table overall [20:39:05] GoranSM: mixed results with sparklyr btw. I can get it running without access to hadoop, but then what's even the point? i'll ping you if i get it running successfully [20:39:09] So it's not like it's batching it at 3k aggressively [20:39:29] bearloga: Will I need to "puppetize" those packages that I install in my local R lib? [20:39:30] * ebernhardson checks if stretch (stat1005) -> jessie (hadoop workers) has the same problems with pyspark jobs & virtualenv that ubuntu (stat1002) -> jessie (hadoop workers) did [20:39:33] bearloga: [20:39:34] ya [20:39:43] and, i can see the eventlogging mysql insert logs on eventlog1001 [20:39:50] 2017-07-11 22:16:05,077 [6854] (MainThread) Inserted 3 ChangesListHighlights_16484288 events in 0.008975 seconds [20:39:53] is the most recent one [20:40:05] RoanKattouw: which docs are you referring to? about stat1002? [20:40:09] GoranSM: nope, you just login, boot up R, and install.packages :) [20:40:34] bearloga: too bad about {sparklyr}. No point running single node Spark. Have you seen this (it's Matloff's): https://github.com/matloff/partools [20:41:00] I'm probably mistaken, I read various subpages of https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging [20:41:05] bearloga: oh thanks good lord (in reference too Puppet) [20:41:15] RoanKattouw: i found this: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration#Troubleshoot_events_coming_in_in_real_time [20:41:21] which istn' helpful, since those archive data logs are only copied once a day [20:41:24] they go to stat1003 too [20:41:26] will change docs there [20:41:42] Right I looked at that [20:41:50] And realized I don't have shell access to eventlog1001 [20:42:04] It looks like accessing Kafka has to be done from stat1002 [20:42:49] In any case -- if I wanted to use a tool other than MySQL to access this data, what would it be and where would I run it? [20:43:03] addshore: so glad to here you have all those windows in place [20:43:07] ottomata: is stat1005/6 going to have a specific restriction that everything installed has to be done through the puppet config or will users continue to be able to install r/python packages into libraries/environments in their homedirs the way it is on 1002? [20:43:15] RoanKattouw: other random things i look at, varnish logs directly to eventlogging-client-side, before anything touches it. I've found it to be a reasonable way to find out if the events are at least coming in from outside. [20:43:23] the kafka topic that is [20:43:56] ottomata: i think that was the essence of GoranSM's question earlier [20:44:08] RoanKattouw: can you test your schema in beta? [20:44:11] RoanKattouw: the data is in HDFS, but it isn't that easy to access, i've had to put on hold a task that will make that a lot easier via hive [20:44:14] but yaaa not done yet [20:44:17] lets' see if they are there... [20:44:18] at least [20:44:26] oh you don't have access to stat1002 [20:44:27] RoanKattouw: ! [20:44:28] get it! [20:44:29] nuria_: ottomata has confirmed that my events are there but not in MysQL [20:44:31] RoanKattouw: you can probably iterate alot fatser than testing with prod fdata [20:44:35] (03PS1) 10Addshore: Fix facebook.php [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364852 [20:44:37] Yeah I now realize I need it [20:44:42] RoanKattouw: right [20:44:55] ya nuria_ not sure what is going on yet [20:44:55] RoanKattouw: that is why i was saying, let's test on betalabs [20:44:58] (03PS1) 10Addshore: Fix facebook.php [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364853 [20:45:01] but the events are in teh eventlogging-valid-mixed topic [20:45:03] (03CR) 10Addshore: [C: 032] Fix facebook.php [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364853 (owner: 10Addshore) [20:45:04] (03CR) 10Addshore: [C: 032] Fix facebook.php [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364852 (owner: 10Addshore) [20:45:07] RoanKattouw: we can do it together if you want [20:45:10] (03Merged) 10jenkins-bot: Fix facebook.php [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364853 (owner: 10Addshore) [20:45:12] Test what exactly? [20:45:13] (03Merged) 10jenkins-bot: Fix facebook.php [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/364852 (owner: 10Addshore) [20:45:16] whihc eventlogging consumes from to insert into mysql [20:45:17] That my events validate? [20:45:26] theyr'e valid, they're in the valid topic :) [20:45:28] We already know that they do, and that they were stored, they just weren't replicated to MySQL [20:45:31] they just haven't been inserted in mysql [20:45:42] RoanKattouw: right, but that can be for many reasons [20:45:51] nuria_: they are also not on master [20:45:54] RoanKattouw: like nested events that python cannot serialize into json [20:45:57] and the last insert in the upstart logs was too long ago [20:46:05] RoanKattouw: that is an error you will see in beta logs as well [20:46:09] RoanKattouw: let me look [20:46:25] nuria_: Sure, I'd be happy to learn how to test in beta, I think that will be generally useful [20:47:00] RoanKattouw: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/TestingOnBetaCluster [20:47:02] I don't think that it'll help for this particular issue, but it can't hurt to try and explicitly exclude that possibility (or be very surprised), and it's also a good thing for me to know how to do regardless [20:47:14] ooh ottomata I need 2 of my secrets changed! (unrelated to the move to stat1005! [20:47:16] RoanKattouw: let me look in logs there [20:47:37] RoanKattouw: beta sometimes requires reawakening let me see [20:47:48] I'll try to provoke some events in the meantime [20:47:50] oo nuria_ as an aside, i think i see some mediawiki_page_create_1 errors [20:48:13] ottomata: let's file ticket no? [20:49:00] RoanKattouw: i think disk on beta might be full [20:49:24] The root fs is full but the /srv fs is not [20:49:40] But lots of things will probably break with a full / fs [20:49:58] Like tab completion in bash :O [20:50:21] RoanKattouw: ya, need to fix that, i think ottomata and myself fill it in with our page create experiemnts [20:50:33] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3432777 (10Addshore) [20:50:36] 10Analytics, 10Analytics-Cluster, 10WMDE-Analytics-Engineering, 10User-Addshore: Move statistics::wmde jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170472#3432775 (10Addshore) 05Open>03Resolved So, I checked a few scripts and they all seem fine (after we fixed some permissions) A... [20:50:53] hmmm, nuria_ i think the mysql consumer has been flapping on eventlog1001, due to a schema thing with new page_create stuff [20:50:54] investigating... [20:51:12] ottomata: k [20:51:12] like, flapping every 5 mins or so, every time it tries to insert an event into page_create [20:51:35] which, will probably mean lots of non inseted mysql data, since insert batches will have failed [20:51:57] GoranSM: regarding https://phabricator.wikimedia.org/T170052#3432621 are you using the new credentials? [20:52:23] nuria_: The events I provoked appear in all-events.log on beta [20:52:41] From 4 mins ago I mean [20:52:46] Exactly the password found by cat analytics-research-client.cnf [20:52:59] addshore: Exactly the password found by cat analytics-research-client.cnf [20:53:31] RoanKattouw: to bad disk is full cause errorlog cannot be created, let's retake this when i free space, likely issue is related to the page create issues ottomata just found [20:53:42] OK [20:54:15] Yeah some events are reaching MySQL but evidently not all of them, so I'm also looking at how else I can get/query events from a place that's known to have all of them (e.g. Kafka) [20:54:28] nuria_: here's what I htink is going on [20:54:32] RoanKattouw: this would affect not just you [20:54:39] Because while this is still being investigated I don't trust the MySQL tables to give me real data [20:54:54] we recently started inserting data into the eventlogging mysql db from eventbus [20:54:59] those schemas are managed differently [20:55:13] we need to start actually versioning them, so they work with mysql [20:55:30] i told petr this today, but there was a change to the schema https://github.com/wikimedia/mediawiki-event-schemas/commit/86b6cff26ab1ed154fe738bad8a08e36a0998a82 [20:55:31] i think [20:55:38] that somehow caused this to break [20:55:46] the mysql table dind't have rev_content_changed [20:55:53] ottomata: right, otherwise records cannot be inserted [20:56:01] ottomata: but that should affect 1 schema not all [20:56:06] https://gerrit.wikimedia.org/r/#/c/362321/ [20:56:08] right [20:56:09] but [20:56:11] the consumer process dies [20:56:15] when it hits this error [20:56:21] and [20:56:30] we don't wait for a mysql insert before we ack to kafka [20:56:34] we just ack to kafka every N seconds [20:56:45] so, stuff that had been consumed and then batched, but not yet inseted [20:56:48] woudl be dropped when the process dies [20:57:02] i'm going to make the evnetlog event be a difffereent eventlogging process [20:57:03] right now [20:57:06] to avoid future issues [20:57:09] mforns: can you that this the correct link? https://wikitech.wikimedia.org/w/index.php?title=Analytics/Systems/EventLogging/Data_retention_and_auto-purging&diff=1764385&oldid=1763921 [20:57:10] but there are probably a few more fixes needed [20:57:19] ..check.. [20:57:24] ottomata: the finally catches batches not inserted though [20:57:48] ottomata: see code in handlers.py [20:58:20] i dunno, its dying [20:58:24] i could be wrong about that [20:58:25] GoranSM: I am not sure then :/ [20:58:28] maybe the batches finish [20:58:32] or, maybe it is very very minimal nuria [20:58:34] ottomata: did you check /var/log/upstart [20:58:44] and the batches finish, but the events consumed from kafka are not queued in the batch [20:58:53] yes, nuria_ [20:59:00] ottomata: ajam [20:59:04] i didn't check for dropped messages [20:59:06] that was just a theory [20:59:14] but i want to stop the thing from dying every 5 minutes asap [20:59:23] addshore: I have no idea what's going on there. I can do HiveQL from (1) R, (2) beeline, I can do mysql anyway I like, all from stat1002; sqoop fails with silly access denied messages. [21:01:45] HaeB, yes, looks correct, I have added the line number to the link, thanks! [21:02:09] thanks! thought about that too, but i reckoned that it will change often as the file is updated.. [21:02:55] yea, good point [21:03:26] mh [21:04:48] aww, for some reason pyspark just hangs on stat1005 with `--master yarn` :( hive seems to be ok though, so its not general hadoop access [21:05:03] It looks like we only started losing events for this starting some time yesterday [21:05:46] I counted events per day in Kafka for the last week and compared to MySQL, and the counts match up for all days except today and yesterdafy [21:11:01] 10Analytics, 10Analytics-Cluster, 10Security: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3432861 (10Addshore) After giving this a quick go it has something to do with passing the password for mysql in through --password. If you use -P and enter the password... [21:11:44] OK, up until 2017-07-11 19:53:17 UTC it appears that Kafka and MySQL contain the same events, after that MySQL started dropping some but not all of them [21:12:04] GoranSM: I have the solution, I'll PM you [21:12:35] addshore: please do [21:12:52] Or rather, it dropped all of them except for those 3 events that ottomata found were inserted around 22:16 UTC, one of which has a 22:16 timestamp and two with 22:04:xx timestamps [21:13:31] 15 missing events on 2017-07-11 (yesterday) and 24 missing events from 2017-07-12 (today) [21:15:04] * RoanKattouw files a task [21:17:02] RoanKattouw: yeah [21:17:04] i think this is related [21:17:05] https://gerrit.wikimedia.org/r/#/c/362322/ [21:18:06] 10Analytics, 10Analytics-Cluster, 10Security, 10User-Addshore: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3432902 (10Addshore) [21:18:38] PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: consumer/mysql-eventbus [21:21:48] RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning. [21:26:05] 10Analytics, 10Analytics-Cluster, 10Security, 10User-Addshore: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3432988 (10Addshore) The SQLException: Access denied issue has been resolved outside of this ticket. @GoranSMilovanovic I assume this needs to stay... [21:27:29] 10Analytics, 10Analytics-EventLogging: ChangesListHighlights events missing from MySQL starting 2017-07-11 - https://phabricator.wikimedia.org/T170486#3432990 (10Catrope) [21:29:31] ottomata: Task ---^^ [21:29:50] 10Analytics, 10Analytics-EventLogging: ChangesListHighlights events missing from MySQL starting 2017-07-11 - https://phabricator.wikimedia.org/T170486#3433006 (10Catrope) @Ottomata Believes https://gerrit.wikimedia.org/r/364894 might fix this [21:35:25] 10Analytics, 10Analytics-Cluster, 10Security, 10User-Addshore: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3433054 (10GoranSMilovanovic) @Addshore Correct. Here's an additional info on what is happening, it might help: Running: sqoop import --connect jdb... [21:36:09] 10Analytics, 10Analytics-Cluster, 10Security, 10User-Addshore: Access rights for HDFS on stat100* for Sqoop tasks - https://phabricator.wikimedia.org/T170052#3433055 (10Addshore) Relating to the main issue, my searching lead me to https://stackoverflow.com/questions/38643944/sqoop-import-as-parquetfile-fro... [21:36:35] ottomata: got 2 mins to change 2 of my passwords in the secrets repo / whereever they are kept? [21:38:42] 10Analytics, 10Analytics-EventLogging: ChangesListHighlights events missing from MySQL starting 2017-07-11 - https://phabricator.wikimedia.org/T170486#3433074 (10Ottomata) Could be wrong about that^ [21:38:47] addshore: sure [21:38:53] I'll pm you :) [21:44:47] RoanKattouw: I haven't seen that evnet anywhere [21:44:51] you sure you made mor? [21:44:52] more? [21:46:29] I can make a few right now [21:46:46] ottomata: OK I just made 3 [21:47:50] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Move statistics::discovery jobs from stat1002 -> stat1005 - https://phabricator.wikimedia.org/T170471#3433169 (10mpopov) I need to repurpose https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/discovery.pp to be the thing that... [21:50:32] 10Analytics-Kanban: Mediawiki History Druid indexing failed - https://phabricator.wikimedia.org/T170493#3433198 (10Milimetric) [21:51:59] RoanKattouw: i gotta run, still haven't seen it, sooo dunno yet, but hopefully we can figure it out tomororw [21:52:01] latterrrsss [21:52:25] FYI a-team, this is Gary from SVDS: https://phabricator.wikimedia.org/T170305 [21:52:56] * gdusbabek is also Gary from SVDS. [21:53:44] hi gdusbabek, didn't see you there :) [21:53:48] welcome to our humble IRC home [21:53:56] :) thx! [21:58:32] RoanKattouw: looking at this again [22:01:52] 10Analytics, 10Analytics-Cluster: Firewalls appear to be preventing spark executors from talking to spark driver on stat1005 - https://phabricator.wikimedia.org/T170496#3433298 (10EBernhardson) [22:01:59] HaeB: what were the things you needed joseph's help with? one was unique devices (which i can help you with) and the other ? [22:02:53] 10Analytics, 10Analytics-Cluster: Firewalls appear to be preventing spark executors from talking to spark driver on stat1005 - https://phabricator.wikimedia.org/T170496#3433315 (10EBernhardson) [22:03:45] 10Analytics, 10Analytics-EventLogging: ChangesListHighlights events missing from MySQL starting 2017-07-11 - https://phabricator.wikimedia.org/T170486#3432990 (10Nuria) The error on mysql consumer: raise errorclass, errorvalue sqlalchemy.exc.OperationalError: (OperationalError) (1054, "Unknown column 'r... [22:04:38] nuria_: the last-access based retention metric. in particular. zareen was working with joal on a somewhat complicated hive query to calculate percentiles from the extract table, and that problem didn't get fully solved back then [22:04:55] i've been planning to give it some focused attention myself later this month as we pick up this project again [22:05:23] and that would involve following up with joal [22:05:53] again, it's just a heads-up -perhaps i'll figure it out myself [22:07:11] HaeB: otherwise that seems like an easy 2 step: 1) get data 2) calculate percentiles [22:07:19] HaeB: hive does not need to do both [22:08:46] when is joal going to return? [22:09:33] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3433350 (10Nuria) 05Resolved>03Open [22:09:54] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Contributors-Analysis, and 6 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3343163 (10Nuria) Reopening as a recent change broke insertion on the schema, see https://phabric... [22:11:37] HaeB: in about 7 weeks [22:14:10] Pchelolo: yt? [22:14:19] Pchelolo: I have a question... [22:14:23] yup nuria_ [22:14:50] Pchelolo: I am trying to understand how are eventbus schemas managed, both page-create and revision-create share schema: https://github.com/wikimedia/mediawiki-event-schemas/commit/86b6cff26ab1ed154fe738bad8a08e36a0998a82 [22:15:14] yup, because all the properties are the same [22:15:24] we've decided to reuse the schema for both topics [22:15:24] nuria_: ok. i wasn't aware of the incoming good news before reading the actual announcement last week ;) i might have picked his brain regarding some UD things before that. but i understand you're familiar with the matter too [22:15:27] Pchelolo: but this new field is not in page-create event (cannot be) [22:15:43] HaeB: yes [22:16:04] nuria_: it's set to 'true' all the time for the page-create [22:16:06] HaeB: we both had to do quite a bit of research this quarter on that computation [22:16:20] Pchelolo: i see, so those schemas are never versioned? [22:16:47] they were not versioned until we decided to start versioning from now on [22:16:56] all the next changes will be versioned [22:16:57] i understand we also got to understood the existing UD measurement better, and fix bugs [22:17:14] we didn't need versioning until you've started importing [22:17:34] from now on we need - so we will version. Example: https://gerrit.wikimedia.org/r/#/c/364600/ [22:18:51] 10Analytics, 10Analytics-EventLogging: ChangesListHighlights events missing from MySQL starting 2017-07-11 - https://phabricator.wikimedia.org/T170486#3433384 (10Nuria) More errolog that looks kind of strange : 9092,kafka1020.eqiad.wmnet:9092,kafka1022.eqiad.wmnet:9092?topics=eventlogging-valid-mixed,eqiad.me... [22:19:46] Pchelolo: I see, ok, on our end we can assume that they are versioned going forward then? [22:19:57] right nuria_ [22:20:23] Right now I'm gonna only version revision-create schema to not create tonsns of non-needed files [22:21:11] HaeB: there was 1 bug yes, although its effect on data was minuscule, if present at all, https://phabricator.wikimedia.org/T165661 [22:22:26] 10Analytics, 10Analytics-EventLogging: ChangesListHighlights events missing from MySQL starting 2017-07-11 - https://phabricator.wikimedia.org/T170486#3433412 (10Nuria) From conversation on IRC looks like these schemas are versioned going forward:= (cc @Pchelolo ) Pchelolo: I see, ok, on our end we c... [22:28:48] nuria_: joal also found https://phabricator.wikimedia.org/T143928#3301667 , which has a larger effect ("Therefore the offset we count in per-domain is currently under-counting (by ~10%)") [22:32:35] anyway, i will post on phabricator and we can take it from there, looping in other people as needed (hopefully not needing to wait for his return though) [22:32:35] thanks for following up here!