[00:15:45] Analytics, WMF-Legal, Privacy: Inform EU readers that we use cookies - https://phabricator.wikimedia.org/T115958#1737105 (Dispenser) NEW [00:50:13] Analytics-EventLogging, Editing-Department, Improving access, Reading Web Planning, discovery-system: Schema changes - https://phabricator.wikimedia.org/T114164#1737211 (csteipp) > Yes, we need event_ip as plaintext IP, the same for event_userAgent. We will take these two information, plus page... [01:40:46] ottomata, i keep seeing this in hive, and the query finishes right away - WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl [01:40:58] don't think that's normal, right? :) [02:33:11] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1737270 (ellery) Is this task being tracked on two tickets? Anyway, you can make the change as far as I'm conce... [04:11:33] yurik: what was the query? did you get results? [04:13:22] madhuvishy, i just re-ran it - now i do get results and no warnings. Before, there was warning but no result (it didn't even start the querying). I suspect this issue is due to the hour being incomplete somehow, and taking over 3 hours to fix it. select * from wmf.webrequest where year=2015 and month=10 and day=19 and hour=23 and uri_path like '%/graph/png/%' limit 100; [04:13:41] yurik: yup, it usually shows up when there's missing data [04:13:56] or the load/refine jobs for the hour were just running [04:15:49] yurik: you can look here for the refine webrequest jobs running - greens are done, yellows are pending data - https://hue.wikimedia.org/oozie/list_oozie_coordinator/0000005-150917143739917-oozie-oozi-C/ [04:18:11] server error 500 ) [04:38:06] yurik: gah [04:38:17] that happens sometimes, sometimes it works too [04:38:20] its flaky [04:38:33] need to look into why that's happening [06:06:26] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [06:08:14] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [06:18:19] Analytics-Backlog: Add scala 2.10.4 to stat1002 and analytics1027 - https://phabricator.wikimedia.org/T115970#1737361 (JAllemandou) NEW [06:25:27] (CR) Joal: [C: 2 V: 1] "Good for me!" [analytics/aggregator] - https://gerrit.wikimedia.org/r/247323 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [06:25:47] (Merged) jenkins-bot: Parametrize the input filenames format [analytics/aggregator] - https://gerrit.wikimedia.org/r/247323 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [06:31:29] (CR) Joal: [C: 2] "Looks good, I assume you have tested the oozie stuff :)" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [08:22:43] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1737513 (Qgil) Do we need to remove deprecated channels? Isn't their information useful to count contributions of related users in those channels? [08:31:02] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1737525 (Dicortazar) This depends on your needs. I mean, as we're not automatically retrieving IRC channels information, we can keep as many channels as... [09:06:25] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1737592 (Aklapper) >>! In T56230#1737513, @Qgil wrote: > Do we need to remove deprecated channels? No, so I've canceled https://github.com/Bitergia/medi... [09:18:49] (PS17) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [09:30:23] Analytics-Backlog: Write a script to automatically run dependent jobs in refinery - https://phabricator.wikimedia.org/T115985#1737628 (JAllemandou) NEW [10:12:01] !log restart cassandra on aqs1002 [10:12:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [10:16:56] Analytics-Tech-community-metrics, DevRel-October-2015: Backlogs of open changesets by affiliation - https://phabricator.wikimedia.org/T113719#1737741 (Qgil) Perfect! [10:31:25] (PS1) Christopher Johnson (WMDE): adds shell script for daily sparql data retrieval [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/247535 [10:38:22] hi a-team :] [10:38:27] Hi mforns :) [10:38:30] howdy? [10:38:41] good thx, you? [10:38:49] good ! [10:43:59] (PS2) Christopher Johnson (WMDE): adds shell script for daily sparql data retrieval [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/247535 [10:50:54] Analytics-Kanban: Update oozie diagram [3 pts] {hawk} - https://phabricator.wikimedia.org/T115993#1737785 (JAllemandou) NEW a:JAllemandou [11:17:41] * joal gets lunch [11:18:28] Analytics-Tech-community-metrics, DevRel-October-2015: Backlogs of open changesets by affiliation - https://phabricator.wikimedia.org/T113719#1737852 (Dicortazar) @Qgil, is there any difference in the list of "Oldest open Gerrit changesets without code review" in http://korma.wmflabs.org/browser/scr-backl... [11:18:37] bon appétit joal [12:31:17] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1737989 (Anmolkalia) @jgbarah, I hope this is what you were looking for. https://github.com/anmolkalia/MediaWikiAnalysis/tree/new [12:39:04] Analytics-Tech-community-metrics, DevRel-October-2015: Backlogs of open changesets by affiliation - https://phabricator.wikimedia.org/T113719#1738009 (Qgil) That still wouldn't help finding i.e. the changeset from Digia, or the oldest from WMDE, unless these happened to be among the 100 oldest... [12:57:44] Hey halfak ! For once, I'm on time :) [13:00:13] morning y'all [13:00:18] hey milimetric [13:00:25] good morning to you :) [13:03:05] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [13:05:16] sorry [13:08:13] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [13:21:43] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [13:23:26] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:01:21] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1738165 (Aklapper) >>! In T112527#1736241, @Aklapper wrote: > ...and no import-related command listed by `sortinghat --help` either. Note... [14:05:38] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, Patch-For-Review: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1738189 (Dzahn) [14:06:25] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [14:08:13] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:27:08] holaaaa [14:28:47] hlllloooooo [14:31:40] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [14:31:57] Analytics, Continuous-Integration-Config, WMDE-Analytics-Engineering, Wikidata: Add basic jenkins linting to analytics-limn-wikidata-data - https://phabricator.wikimedia.org/T116007#1738262 (Addshore) [14:33:09] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:42:54] (CR) Milimetric: "Yeah, see oozie testing above, I linked to the results of the tests in Hue, and you would probably be wise to double check my work here as" [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [14:45:16] (CR) Ottomata: [C: 2 V: 2] "Was the same change made in the relevant Mediawiki avsc?" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/246990 (https://phabricator.wikimedia.org/T115715) (owner: EBernhardson) [14:46:03] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1738330 (Milimetric) This got merged (Jenkins automatically does that when you C +2, so you need to C -1 if you want to prevent it... [14:46:24] (CR) EBernhardson: "yes, that matching schema is here: https://gerrit.wikimedia.org/r/#/c/240615/15/wmf-config/avro/CirrusSearchRequestSet.avsc" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/246990 (https://phabricator.wikimedia.org/T115715) (owner: EBernhardson) [14:46:32] thanks ebernhardson [14:46:46] thanks, we'll test that this week [14:47:21] and you created the topic with partitions iiuc? so we should just be able to deploy [14:49:06] yes, hm, I think we'll need to make a refinery release and deploy in order to get it to pick up that change [14:49:10] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0] [14:49:19] but, yes, the topic is there and camus is already trying to import from it [14:50:51] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:51:39] (PS1) Christopher Johnson (WMDE): refactors into /src removes datatable dom elements adds links to page metrics from seeAlso adds link to this page icon, moves to right side of page adds dashboardReference annotation to metrics.owl [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/247573 [14:51:55] milimetric: yt? [14:52:01] yep [14:52:07] so, 'wsc format' [14:52:17] i think we shoudl think of a different name, this is going to live forever, right? [14:52:32] this is the pageview data but in pagecount format, right? [14:52:44] (PS2) Christopher Johnson (WMDE): refactors into /src removes datatable dom elements adds links to page metrics from seeAlso adds link to this page icon, moves to right side of page adds dashboardReference annotation to metrics.owl [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/247573 [14:52:51] pageview data in whatever that format is called, right [14:52:58] "hard to parse format" [14:53:01] so, when we are refering to definitions [14:53:05] pageview is new, pagecount is old [14:53:26] yes, these new patches keep that consistent (and fix the inconsistency I added earlier) [14:53:34] i guess the format name shouldn't be tied to the definition ame? [14:53:39] since we can have multiple formats? [14:53:55] it's definitely not "pagecount format", yes [14:54:11] can we just call it legacy format? [14:54:12] historical format? [14:54:13] is that to general? [14:54:18] too* [14:54:34] legacy is ok [14:54:39] i was writing historical at the same time [14:54:43] dumps format [14:54:44] ? [14:55:05] dumps not sure - there may be other stuff on dumps in the future [14:55:11] k [14:55:16] legacy is good with me then [14:55:22] and if that's true, then maybe it should be named after what data's in it? [14:55:24] that's what we use for the old udp2log formats [14:55:26] in case we release geo-data in there too? [14:55:33] ? [14:55:42] like, pageview-legacy [14:55:43] ? [14:55:58] like, now it's (project, [page title], count, [size]) [14:56:05] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] refactors into /src removes datatable dom elements adds links to page metrics from seeAlso adds link to this page icon, moves to right side [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/247573 (owner: Christopher Johnson (WMDE)) [14:56:57] nah, it's too hard to mix in anything meaningful in the format name [14:57:00] legacy format is ok [14:57:03] ebernhardson: one thing, did you tested with avro tools that the data validates to your schema [14:57:06] i'll change wsc to "legacy" [14:57:25] ebernhardson: cause remember the bindings in php and java were different [14:57:52] nuria: actually no i didn't i'll do that today [14:58:20] * ebernhardson was thinking the only differences were in the json output, but worth double checking [14:58:29] ebernhardson: ok, please add that step to your docs cause otherwise it is likely we get data it does not validate with java bindings [14:58:41] milimetric: cool [14:59:16] hm, milimetric also [14:59:28] maybe just call the directory vars [14:59:43] pageview_legacy_archive_directory [14:59:44] not sure [14:59:57] this isn't specifically just archiving [14:59:58] right? [15:00:02] hm. the other jobs do that I guess. [15:00:55] nuria: also, unrelatedly, is page_title in wmf.pageview_hourly fully normalized as mediawiki would internally ? [15:01:02] which directories? [15:01:16] (e.g. can i compare the output of a mediawiki title string to wmf.pageview_hourly.page_title) [15:01:39] ebernhardson: we're trying to get it as close to that as possible, but it's not perfect right now [15:01:56] milimetric: [15:02:01] # Archive directory for pageview_hourly_webstatcollector_format [15:02:01] javascript:; [15:02:01] pageview_archive_directory = ${archive_directory}/pageview/webstatcollector/hourly [15:02:07] a lot of the normalization on our side has to do with making index.php?title=blah and /wiki/blah and api.php?page=blah all be the same [15:02:28] milimetric: fair enough, mediawiki never makes things easy :) [15:02:30] ebernhardson: fully normalized? no i do not think so, not yet but cc joal in case we are working on that [15:02:41] I answered above nuria ^ [15:02:42] it is mostly normalized [15:02:44] as much as possible! [15:02:53] it would be much much better to use page_id and normalize that way [15:02:55] not as much as possible yet, there are some casing issues still [15:03:00] buuuut we don'th have that for everything so meh [15:03:33] page_id would be nice, but we have to deal with page renames and keeping a list of all pages up to date [15:03:50] ahh, ok for a moment i thought i just wasn't seeing page_id, that would have been super easy :) [15:04:05] ebernhardson: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L484 [15:04:11] ebernhardson: page_id isn't reliably available, only from certain places [15:04:25] ebernhardson: page_id is avail in X-analytics for many things [15:04:29] not for API requests [15:04:39] see: https://phabricator.wikimedia.org/T92875 [15:05:55] i think that will still be fine for our purposes, thanks! mostly its about being able to take an average of page views over a certain amount of time (TBD) and send those to elasticsearch [15:06:13] elasticsearch is indexed by page_ids already (litteraly it would be a put to, for example, /enwiki_content/page/12345) [15:07:32] hmm, but thats in webrequest and not pageview_hourly :( will have to look closer at things. I'm sure we can figure out a way to map it to our data one way or another [15:08:02] ebernhardson: we can add that to pageview_hourly (not short term but soon-ish) [15:08:20] because our aim is to eventually get that in there anyway [15:08:58] * ebernhardson looks for a ticket to put a +1 on :) [15:11:36] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1738468 (Aklapper) >>! In T56230#1735259, @Aklapper wrote: >>>! In T56230#1734947, @Dicortazar wrote: >> is there a way to automatically retrieve the lis... [15:14:31] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1738470 (Addshore) Also note my last comment at https://github.com/benapetr/wikimedia-bot/pull/49#issuecomment-149596883 It would be trivial for this to... [15:15:26] (PS7) Milimetric: Archive hourly pageviews in legacy format [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) [15:18:23] ottomata: ^ done but we probably should talk before you merge. [15:18:43] I'm going to have lunch because meeting hell is about to start :) [15:19:44] (PS2) Mforns: [WIP] Add oozie job to compute browser usage reports [analytics/refinery] - https://gerrit.wikimedia.org/r/246851 (https://phabricator.wikimedia.org/T88504) [15:20:29] Analytics-Tech-community-metrics: Handling multiple affiliations (at once; like work vs spare time) in tech community metrics - https://phabricator.wikimedia.org/T95238#1738520 (Aklapper) [15:21:02] (CR) Mforns: [C: -1] "Still needs testing." [analytics/refinery] - https://gerrit.wikimedia.org/r/246851 (https://phabricator.wikimedia.org/T88504) (owner: Mforns) [15:22:30] ebernhardson: was in a meeting [15:22:42] Analytics-Tech-community-metrics: Handling multiple affiliations (at once; like work vs spare time) in tech community metrics - https://phabricator.wikimedia.org/T95238#1738527 (Aklapper) Ignore my last (off-topic) comment that should have gone to T112527 instead. [15:22:54] ebernhardson: as ottomata said, mostly normalised is the page_title [15:23:17] As for case differences, they reflect different pages with redirects [15:23:46] joal: ok, millimetric said short term (this Q?) you guys are planning on getting page_id into the pageviews_hourly table. I think we might refocus our current effort twords calculating page rank and wait for that to be ready (page_id is, by far, the best direct mapping into elasticsearch for us) [15:24:20] ebernhardson: makes sense, but I am not aware of page_id delays [15:24:26] i didn't find a ticket for it though [15:24:47] ottomata also said that we have page_id for many, but not all [15:24:55] i think that would probably be close enough [15:25:06] ebernhardson: we can go best effort, providing the id when we have it [15:25:12] +1 to that idea [15:25:14] yea i think thats more than good enough for us [15:25:16] i hadn't heard anyone talk about it though [15:25:44] ebernhardson: for now yes, but less and less given the directions we are taking to try to serve small portions of data through the api - right ? [15:26:07] ottomata: neither do I [15:26:15] the api most likely wont work for our use case, we ned to calculate the data for all pages on all wikis on a weekly basis [15:26:28] (ideally we could do closer to realtime, but for now we are ok with once a week batches :) [15:26:49] but that would be, guessing based on the docs in elasticsearch, about 100M pages [15:26:50] ottomata, ebernhardson I'll create a task to insert page_id when it exists on pageview_hourly [15:26:56] thanks joal [15:27:00] thanks! [15:27:17] np ebernhardson :) [15:28:06] Analytics-Backlog, operations: erbium (logging) - useradd: group '30001' does not exist - https://phabricator.wikimedia.org/T115943#1738546 (Ottomata) Hm, no idea. Can we just do gid => 30001 for file_mover group in role::logging::systemusers in role/logging.pp? [15:29:19] Analytics-Backlog: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1738549 (JAllemandou) NEW [15:29:31] ebernhardson: --^ + watcher [15:30:04] sweet [15:30:32] Analytics-Backlog, operations: erbium (logging) - useradd: group '30001' does not exist - https://phabricator.wikimedia.org/T115943#1738560 (Ottomata) Hm, actually, let's make the gid for the file_mover user use the name rather than the gid. file_mover group exists as gid 997 on erbium. [15:32:30] milimetric: btw, you shoudl probably add the ability to dump dataset stats for this new pageview-legacy thing [15:32:57] we get reports about the other pagecount datasets [15:32:59] /srv/deployment/analytics/refinery/bin/refinery-dump-status-webrequest-partitions --hdfs-mount /mnt/hdfs --datasets pagecounts_all_sites,pagecounts_raw --quiet [15:33:33] is that in puppet somewhere? [15:33:36] ottomata: good catch, we also have it for projectview if IIRC [15:33:59] milimetric: the script is in refinery/bin [15:34:06] the cron job that runs it is in puppet [15:34:07] right, but the running of the script [15:34:16] role/analytics/refinery.pp [15:34:18] somewhere in there [15:34:20] oh would the script have to change too? [15:34:22] yes [15:34:25] k, thx [15:34:31] you'll have to make it know how to dump the status of the new dataset [15:34:33] i'll do that after lunch [15:34:39] 'dump' is the wrong word here but whatever! :) [15:34:42] heh [15:36:30] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:56:20] ebernhardson: let us know when you check validation of messages with avro tools ok? [15:57:44] nuria: yup, getting to that soon (been deploying our multi-datacenter code just now) [15:57:54] ebernhardson: k [15:59:41] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [16:03:29] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [16:04:17] Analytics-Kanban, Mobile-Apps, Patch-For-Review: Investigate and fix inconsistent data in mobile_apps_uniques_daily {hawk} [5 pts] - https://phabricator.wikimedia.org/T114406#1738701 (kevinator) Open>Resolved [16:09:26] (PS1) OliverKeyes: [WIP] functions for identifying search engines as referers. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) [16:09:52] can I ask a favour from one of the UDF-centric Analytics people, equating roughly to one alcoholic or non-alcoholic beverage? [16:13:36] nevermind, fixed it! [16:13:41] y'all missed out on a cream soda [16:14:12] Ironholds: ha ha we are in standup [16:14:24] fair :D [16:20:51] Ironholds: hola [16:21:08] Ironholds: is there any problem with moving your awesome guide: https://office.wikimedia.org/wiki/Discovery_data_access_guidelines [16:21:11] to wikitech? [16:24:49] nuria, legal has requested it stay private for the time being, unfortunately :( [16:24:50] but thank you! [16:25:07] the plan is to have a generalised and transparent guide, but they're waiting on hiring someone to write it, I think [16:26:22] (PS2) OliverKeyes: [WIP] functions for identifying search engines as referers. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) [16:28:21] nuria: do you have a moment to review https://gerrit.wikimedia.org/r/#/c/247512/ ? Aaron OK'd the query. [16:28:40] ori: yes, on standup will look in a minute [16:29:00] thanks [16:30:05] Analytics-Backlog, Analytics-EventLogging, Analytics-Kanban: More solid Eventlogging alarms for raw/validated - https://phabricator.wikimedia.org/T116035#1738860 (Nuria) NEW [16:32:17] (PS3) OliverKeyes: [WIP] functions for identifying search engines as referers. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) [16:33:40] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [16:34:37] Analytics-Backlog, operations, Patch-For-Review: erbium (logging) - useradd: group '30001' does not exist - https://phabricator.wikimedia.org/T115943#1738888 (chasemp) [16:37:10] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [16:40:01] PROBLEM - puppet last run on erbium is CRITICAL: CRITICAL: Puppet has 1 failures [16:42:36] (PS4) OliverKeyes: [WIP] functions for identifying search engines as referers. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) [16:43:57] nuria: thanks, amended [16:45:58] Analytics-Tech-community-metrics, DevRel-October-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1738927 (Aklapper) p:Normal>Low a:Aklapper [16:47:09] Analytics-Tech-community-metrics, DevRel-October-2015: Correct affiliation for code review contributors of the past 30 days - https://phabricator.wikimedia.org/T112527#1738934 (Aklapper) p:Normal>High [16:48:53] nuria, mforns : doule checked today Edit table, no holes [16:49:08] joal, ok [16:50:36] :] [16:50:59] joal: ok, will try to look today if maybe comparing a rolling average of both metrics in graphite works better [16:51:19] mforns: double checked as well yesterday and last weekend: no holes [16:51:29] so yeah, graphite stuff :( [16:51:38] thanks nuria [16:52:02] joal, thank you for looking at this :], soory for not having told you that backfilling was needed, I forgot [16:52:23] tomorrow we can have a look [16:52:52] mforns: no problem at all, I have heard of the issue, knew that there was problem with backfilling, but didn't realize it was my turn to take aver it :) [16:53:09] We'll handle that tomorroe :) [16:53:19] ok :] [16:53:52] milimetric: have you tried AQS recently ? [16:54:58] no [16:55:15] cause last time I tested I got an error :( [16:55:30] hm [16:56:22] milimetric: seems to be cassandra not answering fast enough [16:56:36] milimetric: works on per-project, not on per-article [16:56:50] curl http://localhost:7231/analytics.wikimedia.org/v1/pageviews/per-project/en.wikipedia/all-access/user/daily/2015100100/2015100101 -- Ok [16:57:05] hm, makes some sense based on how much data's in there [16:57:32] milimetric: does indeedcurl http://localhost:7231/analytics.wikimedia.org/v1/pageviews/per-article/en.wikipedia/Barack_Obama/all-access/user/daily/2015100100/2015100101 --KO [16:57:33] but they were saying we can change the timeout default? [16:57:35] it's 2 seconds now? [16:57:45] {"type":"https://restbase.org/errors/not_found","title":"Not found." [16:57:51] weird :( [16:58:09] milimetric: I think it's on client side [16:58:19] I could change the timeout for my cqlsh client for instance [17:00:19] joal: if it's a timeout, then you should get log messages to that effect [17:00:52] gwicke: ok, then it's something else [17:01:02] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1738984 (Dicortazar) Ok, channels are updated. Having a JSON file with the list of channels and log place would be awesome :). @Aklapper, from my side... [17:01:11] madhuvishy: meeting? [17:01:17] coming [17:09:40] Analytics-Tech-community-metrics, DevRel-October-2015, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1739024 (Aklapper) Open>Resolved Thanks! Closing. [17:43:20] (PS5) OliverKeyes: Functions for identifying search engines as referers. [analytics/refinery/source] - https://gerrit.wikimedia.org/r/247601 (https://phabricator.wikimedia.org/T115919) [17:59:02] Analytics-Backlog: Add page_id to pageview_hourly when present in webrequest x_analytics header - https://phabricator.wikimedia.org/T116023#1739356 (EBernhardson) [18:05:31] a-team, I'm off for today :) [18:05:41] have a good night joal [18:05:45] Thx :) [18:05:48] ok, joal, see you tomorrow! night [18:05:59] see you tomorrow mforns for some backfill :) [18:06:10] yes :] [18:06:42] ottomata: when you stop being busy today you wanna try and deploy the new pageview dataset [18:06:47] oooh, just missed him [18:06:49] 4 seconds :) [18:07:11] i'm off to see the starwars trailer [18:07:25] milimetric: he he [18:07:49] sweet! we should all go see it in Jan. [18:07:59] (for the 5th time for me, probably, but still) [18:08:42] milimetric: yeah! [18:08:44] :D [18:11:03] madhuvishy: I'll ask otto next time I see him! [18:11:24] addshore: okay :) you can just file a ticket too, and i'll poke him [18:11:39] addshore: Analytics-Backlog [18:11:42] infact, yeh, I'll do a ticket now ;) [18:11:46] otherwise I will liekly forget... [18:13:08] Analytics-Backlog: Sync Addshore's LDAP to hue - https://phabricator.wikimedia.org/T116059#1739411 (Addshore) NEW a:Ottomata [18:13:16] :) [18:13:41] Analytics-Backlog: Sync Addshore's LDAP to hue - https://phabricator.wikimedia.org/T116059#1739422 (Addshore) [18:14:02] addshore: cool [18:16:44] milimetric: found my mistake on pageview per article --> Works fine :) [18:17:13] joal: you said you were gonna leave! [18:17:18] get outta here [18:17:19] almost ;) [18:17:24] Now i Leave :D [18:17:27] BYE [18:17:28] :) [18:19:19] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [18:21:00] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [18:26:10] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [30.0] [18:30:06] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [18:30:46] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [18:31:49] Analytics-Backlog, Quarry: Build an internal Quarry instance to share data & sample queries between researchers (and other analytics users?) - https://phabricator.wikimedia.org/T75142#1739514 (madhuvishy) [18:42:29] (PS9) Nuria: Add pageview quality check to pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) (owner: Joal) [18:42:55] madhuvishy: yt? [18:43:01] nuria: yup [18:43:41] madhuvishy, ottomata : have you seen this error in oozie before: "FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time" [18:44:06] nuria: hmmm may be, what are you running? [18:44:24] madhuvishy: testing oozie the pageview jobs+whitelist checking [18:45:11] madhuvishy: but you would think it will fail at that right away .. but no, it takes it couple hours [18:45:17] nuria: hmmm [18:45:46] nuria: no [18:46:24] nuria: do you have the coordinator id? [18:46:56] madhuvishy: yes, 0044190-150922143436497-oozie-oozi-C [18:47:03] madhuvishy: let me look in hue [18:49:01] nuria: I see loads of heartbeats [18:49:03] and then failure [18:49:23] madhuvishy: where are you looking? [18:50:26] nuria: in hue logs for the failed action [18:50:31] madhuvishy: cause the funny thing is that this looks to have succeeded: https://hue.wikimedia.org/jobbrowser/jobs/application_1441303822549_116040 [18:51:03] nuria: which workflow launched this? [18:51:46] madhuvishy: the workflow file? [18:51:51] the id [18:52:10] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1739669 (awight) Confirmed that the campaign is intact. All the pipeline does is store URLs in a file, the ban... [18:52:25] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [18:52:55] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1739673 (awight) Also, for the record we are now talking about beaconImpressions files, not bannerImpressions.... [18:53:35] madhuvishy: looking, oozie not responding in 1002, wait a sec [18:55:15] nuria: i dont think that job succeeded, it says so but logs show the same lock error [18:55:54] madhuvishy: right, right, i can see it failed on [18:55:59] https://www.irccloud.com/pastebin/PHGy1Ixm/ [18:57:23] madhuvishy: it will just be nice if .. you would get something else beside "lock error ", on what table, resource ... [18:57:41] nuria: https://issues.apache.org/jira/browse/HIVE-7445 [18:58:07] madhuvishy: jajaja ay ayaya [18:58:56] nuria: it seems to have been fixed long back [18:58:59] madhuvishy: you can read teh desperation on whoever filed it. [18:59:10] madhuvishy: right, on 0.14 [18:59:30] we seem to have hive 1.1.0 [18:59:51] madhuvishy: mmm, no wait [19:01:04] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [19:08:04] madhuvishy: I remember something about hive 14 [19:08:06] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [19:08:48] nuria: hmmm [19:10:14] madhuvishy: right, CDH 5.2.0 includes Hive version 0.13.1. [19:10:27] we have cdh5.4 [19:10:34] ah yes [19:10:44] https://www.irccloud.com/pastebin/tzJF9Ttp/ [19:12:59] madhuvishy: true true so this should be fixed if but it is not [19:13:12] yeah [19:13:48] nuria: did you have multiple coordinators running at the same time may be? [19:13:56] is concurrency set to 2? [19:14:15] madhuvishy: let me see, i did not have several that is for sure [19:15:27] madhuvishy: ya, it is the pageview hourly with sets it to 2 [19:17:37] madhuvishy: what was the command you were using to pass hive site when testing oozie jobs? [19:17:58] madhuvishy: i pass it like this: -Dhive_site_xml=/tmp/nuria/hive-site.xml [19:18:14] madhuvishy: but i remember you were using something different [19:18:41] nuria: yeah [19:18:44] let me find it [19:19:18] -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/2015* | tail -n 1 | awk '{print $NF}') [19:19:32] i use this to pick up latest refinery directory [19:19:41] and hive site is configured to pick it up from here [19:20:23] hive_site_xml = ${refinery_directory}/oozie/util/hive/hive-site.xml [19:21:01] madhuvishy: that has a vibe from qchris gotta say [19:21:06] or otto [19:21:12] i dint write it :P [19:21:26] and I change the oozie directory property to be the one I put on hdfs for testing [19:23:35] madhuvishy: right, i run it like: [19:23:37] oozie job -run -Duser=nuria -Darchive_directory=hdfs://analytics-hadoop/tmp/nuria -Doozie_directory=/tmp/oozie-nuria/oozie -config ./oozie/pageview/hourly/coordinator.properties -Dstart_time=2015-09-04T00:00Z -Dstop_time=2015-09-04T01:00Z -Drefinery_directory=hdfs://analytics-hadoop$(hdfs dfs -ls -d /wmf/refinery/2015* | tail -n 1 | awk '{print $NF}') [19:24:03] madhuvishy: let me know if you see something amiss [19:24:07] missing, that is [19:24:42] nuria: looks good i think [19:24:56] madhuvishy: ok, let me give it one more try [19:26:35] madhuvishy: thank you for your help ! [19:28:48] Analytics-Wikistats: Provide ip->geo lookup for X-Forwarded-For header field - https://phabricator.wikimedia.org/T48271#1739907 (Yurik) Is this about properly geo-tagging the original IP even if it passes through a proxy like OperaMini? In that case I suspect this is going to be done as part of the Varnish t... [19:30:05] nuria: np! i see it failed differently now [19:30:19] madhuvishy: no, wait, i killed that one [19:30:29] madhuvishy: wanted to see how hue displayed taht [19:31:01] ah okay [19:49:34] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [19:52:57] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [19:54:11] Analytics-Backlog: Sync Addshore's LDAP to hue - https://phabricator.wikimedia.org/T116059#1739959 (Ottomata) Open>Resolved Done! [20:05:52] ottomata: wanna deploy AQS? [20:06:04] a change we needed to support the public endpoint just landed [20:07:04] ok [20:07:08] oof now to remember how.. [20:09:31] AHHHGHH [20:09:37] did I just deploy and restart somethign on restbase1001? [20:09:49] milimetric: didn't you get perms to do this? [20:10:06] (CR) Nuria: "I think we can merge this code but , since much of what our map reduce job is doing is argument checking. Could we benefit from hardcoding" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) (owner: Joal) [20:10:09] ottomata: deploying on restbase isn't good [20:10:14] let's talk in -services? [20:10:21] k [20:10:23] I don't think I have permissions to deploy, just to poke around and restart [20:10:35] I might though, I forget the timing [20:41:34] ottomata: where should the burrow config stuff go? should i put it in the kafka puppet submodule? [20:41:44] naw, probalby its own module [20:42:10] ottomata: okay, just on op-puppet/modules? [20:49:35] ja OR make it an awesome module that anybody can use [20:49:43] puppet-burrow [20:49:43] :) [20:49:49] but you can dev it in ops puppet/modules [20:49:56] gotta run, laterrrs [20:56:58] Analytics-Backlog: Create deb packages for Golang and GPM - https://phabricator.wikimedia.org/T116084#1740128 (madhuvishy) NEW a:Ottomata [20:57:40] Analytics-Backlog: Create deb packages for Golang and GPM - https://phabricator.wikimedia.org/T116084#1740141 (madhuvishy) [20:57:41] Analytics-Cluster, Analytics-Kanban: Use Burrow for Kafka Consumer offset lag monitoring - https://phabricator.wikimedia.org/T115669#1740140 (madhuvishy) [21:26:44] Analytics-Backlog: Sync Addshore's LDAP to hue - https://phabricator.wikimedia.org/T116059#1740189 (Addshore) many thanks! :) [21:36:56] you can get metircs from kafka in grafana? :O [21:39:52] madhuvishy: ^^ any idea? ;) [21:41:03] addshore: what are you looking for? [21:41:24] we use kafka for eventlogging and webrequest, so not sure what dashboards you want :) [21:54:39] Analytics, MediaWiki-API: api.log does not indicate errors and exceptions - https://phabricator.wikimedia.org/T113672#1740260 (Spage) >>! In T113672#1704407, @bd808 wrote: > It looks like we will have to do some refactoring of ApiMain to expose the error code in a way that we can get it into the logging... [21:55:59] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, Patch-For-Review: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1740271 (Spage) "There was an error" should be part of the data collected, even if we can't yet... [22:00:48] well I just saw https://grafana-admin.wikimedia.org/dashboard/db/wikimedia-blog madhuvishy which uses kafka.kafka1020_eqiad_wmnet_9999.kafka.server.BrokerTopicMetrics.BytesOutPerSec.eventlogging_WikimediaBlogVisit.OneMinuteRate [22:01:36] any ideas if there is an easy way to see what is available to grafana through kafka like this? oh, or does this still come from graphite? [22:02:47] oh yes, they are all in graphite too.. [22:03:00] addshore: yeah i think everything would be in graphite. [22:05:55] addshore: https://github.com/jmxtrans/jmxtrans is what we use to send data [22:13:13] addshore: what you are looking at is eventlogging [22:14:17] addshore: which reports events for all schemas via stastsd, generated via kafka::server::jmxtrans [22:15:21] ah sorry, my irc did not show me madhuvishy reply [22:28:27] (PS10) Nuria: Add pageview quality check to pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) (owner: Joal) [22:30:32] nuria: would anything break if the cookie was renamed 'Last-Access' (instead of WMF-Last-Access)? [22:31:31] ori: changing varnish code, right? [22:31:36] yep [22:31:39] Analytics-Kanban: {kudu} Wikimetrics for IPL - https://phabricator.wikimedia.org/T114423#1740408 (JAnstee_WMF) [22:31:55] ori: so varnish enters Last-Access on the x-Analytics field, correct? [22:32:30] ori: then, no, nothing will break, queries will need to be updated to look for both values but that is no biggie [22:32:54] ori: as those jobs are not "productionized" yet [22:39:31] ori: there was a big discussion on why this was named WMF-Last-Access [22:39:44] what's the reason to change it? [22:39:48] let me find thread [22:41:09] ori: https://lists.wikimedia.org/pipermail/analytics/2015-April/003870.html [22:50:25] Analytics-Backlog, Analytics-Kanban: Projections of cost and scaling for pageview API. - https://phabricator.wikimedia.org/T116097#1740485 (Nuria) NEW [22:50:37] Analytics-Backlog, Analytics-Kanban: Projections of cost and scaling for pageview API. - https://phabricator.wikimedia.org/T116097#1740492 (Nuria) [22:57:08] hi madhuvishy. heads up that I will be assigning (in phab sense) some tasks to you for article rec per Yuvi's assessment. Feel free to remove your name or ping if there is a problem. :-) [22:57:20] leila: sure [23:02:19] ori: ahem.. on that thread brandon is for the 'WMF' is he not? [23:02:54] nuria: yes [23:27:00] (CR) Nuria: "This is ready to merge." [analytics/refinery] - https://gerrit.wikimedia.org/r/240099 (https://phabricator.wikimedia.org/T109739) (owner: Joal)