[00:25:20] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, Patch-For-Review: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1525053 (bd808) [00:25:41] Analytics-Engineering, Analytics-EventLogging, Need-volunteer, Patch-For-Review: EventLogging calling deprecated SyntaxHighlight_GeSHi::buildHeadItem - https://phabricator.wikimedia.org/T71328#1727042 (Legoktm) Open>Resolved [00:28:54] legoktm: thanks ^ [00:29:11] yw :) [00:35:03] Analytics-Cluster, Database: Replicate Echo tables to analytics-store - https://phabricator.wikimedia.org/T115275#1727080 (Neil_P._Quinn_WMF) [10:53:23] (CR) Joal: [C: 1] "Minor comments inline." (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [11:15:25] Analytics-Engineering, Community-Tech: [AOI] Add page view statistics to page information pages (action=info) - https://phabricator.wikimedia.org/T110147#1727738 (NiharikaKohli) @kaldari, I think this should be in the Blocked column instead of Assigned to other teams. [12:10:06] !log Stopped daily and monthly mobile unique coordinators [12:10:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:12:18] !log Rerunning daily mobile unique jobs for days 2015-08-[03,04,11,12,12,14,17], 2015-09-16 [12:12:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:12:48] !log Restarting daily and monthly mobile unique coordinators with new patch [12:12:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [12:17:23] !log Refinery deploy needed before restart --> Deploying [12:17:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [13:07:49] Analytics-Kanban, Mobile-Apps, Patch-For-Review: Investigate and fix inconsistent data in mobile_apps_uniques_daily {hawk} [5 pts] - https://phabricator.wikimedia.org/T114406#1728009 (JAllemandou) Fix deployed, data corrected (highest value kept) for days: - 2015-08-03 - 2015-08-04 - 2015-08-11 - 2015-0... [13:43:37] (PS8) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [13:59:25] joal: hello, yt/ [13:59:31] Hi nuria [13:59:34] I am ! [14:00:09] joal: is there a workarround for this error i was getting while testing oozie: " SemanticException [Error 10071]: Inserting into a external table is not allowed pageview_hourly" [14:00:19] cave > [14:00:20] ? [14:00:27] sure, 2 mins [14:01:24] joal:omw [14:01:28] k [14:05:08] hi a-team! [14:05:15] Analytics, Services, operations: Automatic monitoring not working for AQS - https://phabricator.wikimedia.org/T115588#1728063 (mobrovac) NEW [14:05:17] hi! [14:05:18] Hi a-team :) [14:05:39] howdy [14:05:57] Good milimetric :) [14:06:11] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1728076 (mobrovac) [14:06:12] Analytics, Services: restbase is not listening on port 7231 on aqs* - https://phabricator.wikimedia.org/T114742#1728072 (mobrovac) Open>Resolved >>! In T114742#1725185, @Dzahn wrote: > We still have 3 CRITs in Icinga for "Restbase endpoints health" on aqs and there was a comment next to them linking... [14:18:35] kevinator: not usual to have you this early :) [14:18:46] kevinator: enjoying home ? [14:19:03] yes, but I feel like my clock is all messed up. [14:19:12] :) [14:19:25] I'm not used to working very much before standup [14:23:21] Analytics-Backlog: Traffic Breakdown Report - Browser Major Minor Version {lama} - https://phabricator.wikimedia.org/T115590#1728090 (Milimetric) NEW [14:24:34] Analytics-Backlog: Traffic Breakdown Report - Client OS Major Minor Version - https://phabricator.wikimedia.org/T115591#1728097 (Milimetric) NEW [14:25:21] Analytics-Backlog: Traffic Breakdown Report - Google Requests - https://phabricator.wikimedia.org/T115592#1728103 (Milimetric) NEW [14:26:15] Analytics-Backlog: Traffic Breakdown Report - Mime Type {lama} - https://phabricator.wikimedia.org/T115594#1728118 (Milimetric) NEW [14:27:52] Analytics-Backlog: Traffic Breakdown Report - Target Wiki {lama} - https://phabricator.wikimedia.org/T115595#1728124 (Milimetric) NEW [14:30:13] Analytics-Backlog: Traffic Breakdown Report - Crawlers {lama} - https://phabricator.wikimedia.org/T115596#1728132 (Milimetric) NEW [14:32:20] Analytics-Backlog: Traffic Breakdown Report - User Agent Overview {lama} - https://phabricator.wikimedia.org/T115599#1728152 (Milimetric) NEW [14:34:14] Analytics-Backlog: Traffic Breakdown Report - User Agents Trend {lama} - https://phabricator.wikimedia.org/T115601#1728165 (Milimetric) NEW [14:34:27] Analytics-Backlog: Traffic Breakdown Report - Browser Trend {lama} - https://phabricator.wikimedia.org/T115602#1728171 (Milimetric) NEW [14:38:38] Analytics-Backlog: Traffic Breakdown Report - Visiting Country {lama} - https://phabricator.wikimedia.org/T115605#1728191 (Milimetric) NEW [14:39:15] Analytics-Backlog: Traffic Breakdown Report - Visiting Country per Wiki {lama} - https://phabricator.wikimedia.org/T115607#1728204 (Milimetric) NEW [14:41:47] Analytics-Backlog: Traffic Breakdown Report - Visiting Country per Wikipedia Language {lama} - https://phabricator.wikimedia.org/T115608#1728210 (Milimetric) NEW [14:42:34] Analytics-Backlog: Traffic Breakdown Report - Visiting Country per Wiki Trend {lama} - https://phabricator.wikimedia.org/T115609#1728216 (Milimetric) NEW [14:43:38] Analytics-Backlog: Traffic Breakdown Report - Browsers from Visiting Country {lama} - https://phabricator.wikimedia.org/T115610#1728223 (Milimetric) NEW [14:45:34] Analytics-Backlog: Traffic Breakdown Report - Device by Site from Visiting Country {lama} - https://phabricator.wikimedia.org/T115612#1728241 (Milimetric) NEW [14:45:54] Analytics-Backlog: Traffic Breakdown Report - Client OS from Visiting Country {lama} - https://phabricator.wikimedia.org/T115613#1728248 (Milimetric) NEW [14:46:15] Analytics-Backlog: Traffic Breakdown Report - Google Requests {lama} - https://phabricator.wikimedia.org/T115592#1728254 (Milimetric) [14:46:25] Analytics-Backlog: Traffic Breakdown Report - Client OS Major Minor Version {lama} - https://phabricator.wikimedia.org/T115591#1728256 (Milimetric) [14:48:49] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1728261 (ezachte) While waiting for new input for Monthly Pageview Reports (which is coming along, thanks @Milimetric !), I looked... [14:49:35] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1728262 (Milimetric) FYI - we are starting to prioritize this work for this quarter. We've made tasks for each of the reports listed on the wiki page [1]. These tasks can be found in the #Analytics-Ba... [15:18:22] joal: do you have 1 sec? [15:18:27] I do nuria [15:18:38] Analytics, Services, operations: Automatic monitoring not working for AQS - https://phabricator.wikimedia.org/T115588#1728301 (mobrovac) a:mobrovac [15:19:02] joal: I am still geting the hive error despite having changed all these in coordinator properties: [15:19:14] https://www.irccloud.com/pastebin/Z2lH1SOi/ [15:20:24] I must be missing a place where things have to be changed [15:20:44] joal: i also have -Doozie_directory=/tmp/oozie-nuria/oozie [15:21:12] where i put my oozie directory in hdfs like: [15:21:14] hdfs dfs -rmr /tmp/oozie-nuria ; hdfs dfs -mkdir /tmp/oozie-nuria; hdfs dfs -put oozie/ /tmp/oozie-nuria [15:21:14] nuria: you need to reset hive_site (for sure) [15:21:35] joal: right, my hive-site is -Dhive_site_xml=/tmp/nuria/hive-site.xml [15:21:53] nuria: really ? [15:22:20] nuria: I guess you have it taken from a refinery deploy ? [15:22:36] joal: yes, i launcxh the job like: [15:22:52] oozie job -run -Duser=nuria -Dhive_site_xml=/tmp/nuria/hive-site.xml -Darchive_directory=hdfs://analytics-hadoop/tmp/nuria -Doozie_directory=/tmp/oozie-nuria/oozie -config ./oozie/pageview/hourly/coordinator.properties -Dstart_time=2015-09-05T00:00Z -Dstop_time=2015-09-05T01:00Z [15:23:17] joal: which is the way i have tested in the past, how else can you pass hive-site? [15:23:54] Giving it a real deployed one (from refinery/current for instance) [15:24:04] But if it's an ok version (then that's not it) [15:24:24] When you have created your tables, you have changed the locations, right ? [15:24:26] joal: maybe my hive-site is old [15:24:48] nuria: shouldn't, hive-site has not changed for ages [15:25:09] joal: location of tables is: LOCATION '/tmp/nuria/whilelist'; [15:25:19] mmm.. typooo [15:26:45] joal: ok, if all sounds good i will look again, i must have missed one place [15:26:55] nuria: soun [15:26:58] ds weird [15:27:16] There 3 tables --> 3 different path ? [15:27:30] nuria: --^ [15:28:58] joal: 1 path per table you mean? [15:29:16] there is pageview, whitelist, and unexpected_values [15:29:22] Must have a path for each [15:31:14] joal: thank you, changing that [15:31:20] joal: morning :) [15:31:38] Have you figured out the class not found error? [15:31:39] nuria: if you use the same path for each table, hive won't work :) [15:31:52] joal: ajajam [15:32:13] hey madhuvishy : class not found solved, now fighting (for at least 2 hours) with classpath ordering [15:32:46] Joal, aah did you have to split out the inner class to a separate one to solve it? [15:32:48] madhuvishy: for the class not found, I extracted the reducer class from the CassandraXSVLoader one [15:32:53] yup [15:33:32] joal, okay. I'm getting to office, let's catchup when I'm there/after standup! [15:33:39] sure madhuvishy :) [15:33:42] safe trip [15:47:05] nuria, yt? [15:47:19] yesssir [15:47:22] hi :] [15:47:27] milimetric: for fun: https://github.com/mrdoob/three.js [15:47:34] mforns: hola [15:47:34] nuria, regarding https://phabricator.wikimedia.org/T88504 [15:48:00] aham [15:48:06] you say we need 2 oozie jobs, one for mobile and another for desktop? [15:48:29] ideally yes, one reads from all sources [15:48:37] other reads from mobile source only [15:48:41] both produce a file [15:48:57] I see [15:50:10] dan just filed similar work items to get these from wikistats but I think delivering the browser report sooner will help a lot of developers [15:50:19] cc milimetric [15:50:28] aha [15:50:42] thx [15:51:00] I've new thoughts after going through those tasks [15:51:04] we should talk at standup [15:51:10] milimetric: yess, wannna share? [16:02:23] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Setup pipeline for search logs to travel through kafka and camus into hadoop {hawk} [55 pts] - https://phabricator.wikimedia.org/T113521#1728388 (kevinator) Open>Resolved [16:02:55] Analytics-Kanban: Enforce policy for each schema: Sanitize {tick} [8 pts] - https://phabricator.wikimedia.org/T104877#1728392 (kevinator) [16:02:56] Analytics-Kanban, Database: Delete obsolete schemas {tick} [5 pts] - https://phabricator.wikimedia.org/T108857#1728391 (kevinator) Open>Resolved [16:03:52] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728393 (kevinator) Open>Resolved [16:04:31] Analytics-Kanban, Patch-For-Review: Create Hadoop Job to load data into cassandra [34 pts] {slug} - https://phabricator.wikimedia.org/T108174#1728398 (kevinator) Open>Resolved [16:05:03] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728402 (Ironholds) For what it's worth, as the author of both the R definition and the Java definition I absolutely agree with this approach - with the exceptio... [16:11:21] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728417 (Nuria) > If the wiki page and code disagree, the code is wrong. You are right, this should be the approach for a definition with such a high visibility.... [16:11:22] (CR) Milimetric: Archive hourly pageviews by article in wsc format (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/246149 (https://phabricator.wikimedia.org/T114379) (owner: Milimetric) [16:19:36] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1728458 (Ironholds) English really needs a word for "yay, we agree!" but in the absence of that I'll just say this :P [17:12:09] Analytics-Backlog: ==== Immediate Above ==== - https://phabricator.wikimedia.org/T115634#1728670 (Milimetric) NEW [17:14:26] Analytics-Backlog, Analytics-Cluster: Implement better Webrequest load monitoring {hawk} - https://phabricator.wikimedia.org/T109192#1728684 (Milimetric) [17:29:07] Analytics-Backlog: Move camus files from refinery to puppet - https://phabricator.wikimedia.org/T113990#1728723 (Nuria) This is now done. [17:29:16] Analytics-Backlog: Move camus files from refinery to puppet [3] - https://phabricator.wikimedia.org/T113990#1728724 (Nuria) [17:30:00] Analytics-Backlog, Analytics-Kanban: Move camus files from refinery to puppet [3] - https://phabricator.wikimedia.org/T113990#1681855 (Nuria) [17:33:07] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1728741 (Jgreen) >>! In T97676#1726675, @ellery wrote: > 1:10 is already much better. Pgheres has a campaign fi... [17:35:32] mforns: let's talk about browser report in a bit [17:39:10] Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1728747 (awight) @ellery: just noting that there's still an open question for you, which is blocking us. I don... [17:41:58] * milimetric going to get lunch [17:51:24] (PS12) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [17:52:29] mforns: let me know when you are back [17:54:49] Hey a-team, have a good day ! [17:54:56] See you tomorrow [18:04:05] milimetric: nuria this documentation debate is endless - we only want to remove docs for things like Kraken that never existed right? [18:04:15] madhuvishy: yes [18:05:15] nuria: and we'll archive old pages if they are about things we worked on, but don't look at anymore - correct? [18:05:24] madhuvishy: right [18:05:28] cool [18:05:35] madhuvishy: that is what would make sense to me. [18:09:52] agreed, btw, both that the debate is silly and about what to delete [18:10:11] yeah [18:20:50] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728829 (jgbarah) Hi, @Anmolkalia, I've run your code, and I don't see that unicode error you see: ``` $ python mediawiki_analysis.py --database mdb --db-user jgb --db-password XXX --url http... [18:23:52] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728836 (jgbarah) In fact, after a while, I get an error with your code: ``` $ python mediawiki_analysis.py --database mdb --db-user jgb --db-password XXX --url https://www.wikipedia.org/w None... [18:24:28] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728837 (Anmolkalia) Hi, @jgbarah, I am getting that error if I run the code again, basically when the database already contains the values that are being retrieved. Meaning, the print statement... [18:26:09] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728839 (jgbarah) >>! In T114437#1717889, @Anmolkalia wrote: > There seems to be a problem with the print statement in the insert_page function, line 101. It seems to be working if I am using on... [18:28:10] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728848 (Anmolkalia) >>! In T114437#1728836, @jgbarah wrote: > In fact, after a while, I get an error with your code: > > ``` > $ python mediawiki_analysis.py --database mdb --db-user jgb --db-... [18:30:19] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728852 (Anmolkalia) >>! In T114437#1728839, @jgbarah wrote: >>>! In T114437#1717889, @Anmolkalia wrote: >> There seems to be a problem with the print statement in the insert_page function, line... [18:31:24] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1728854 (jgbarah) BTW, to easy the process of reviewing the code, maybe you can fork the MediaWikiAnalysis GitHub repository, point me to your fork, and I just clone it. That way I can follow yo... [18:40:06] madhuvishy: thanks for responding to the thread, but I think this discussion is not very useful [18:40:18] ya [18:40:31] the folks chiming in have never read our documentation, they will never read our documentation, and they are just speaking from a theoretical point of view of a marginal use case [18:40:37] yeah [18:40:52] our documentation should meet the needs of people that don't know about our world. We'll continue to make it better for those people [18:41:31] yup! [18:41:57] k, a-team, see above regarding the documentation discussion. I'm not going to respond on that thread, it's not constructive [18:47:27] milimetric, btw, are you planning to work on the wikimetrics changes this quarter? [18:47:50] mforns: yes, we roughly scheduled them to start next week [18:48:26] I'm going to show Amanda's partner how to develop on wikimetrics, see if he wants to help too [18:48:39] but I'll make tasks and you can join if you like [18:49:05] ok [18:50:00] milimetric, is anyone else willing to work on this? [18:51:16] I donno :) [18:51:27] I was thinking of a personal goal for this quarter [18:51:37] I can only answer for myself - and I am lazy so I like easy tasks that have really excited stakeholders [18:51:55] hehe [18:52:07] so I'm happy to work on this. I'm not sure what my personal goals will be, but if you want to work on this, definitely add it as your personal [18:52:22] if you want, we can chat for a bit so you know what's involved [18:52:32] mmm, I'll think about this [18:52:39] k, lemme know [18:52:45] k, thanks :] [18:55:38] I'm committed to the puppet changes [18:55:52] For wikimetrics [18:56:08] aha [18:56:11] Happy to help with the other changes too [19:13:13] nuria, do you think we should remove spiders from the browser usage report? [19:21:10] mforns, I am not nuria, but given the number of questions I've got about spider traffic counts: no ;p [19:21:35] hey Ironholds, thanks for chiming in :] [19:21:41] cool, I'll keep them [19:21:49] I dunno, wait for nuria, I may also be an edge case :D [19:22:01] ok [19:22:03] :] [19:47:03] Analytics-Cluster, Analytics-Kanban, Easy: PM sees reports on browsers (Weekly or Daily) [8 pts] - https://phabricator.wikimedia.org/T88504#1729092 (mforns) How about something like this for mobile (mobile web + mobile app): ``` os browser view_count iOS 9 Mobile Sa... [19:56:56] Ironholds: hola [19:57:02] * Ironholds waves [19:57:18] mforns: yes, as the browser report is for developers [19:57:25] mforns: to be efficient traging bugs [19:57:25] nuria, ok [19:57:32] mforns: it does not represent traffic [19:57:49] Ironholds: we are going to have maybe a better proxy for spiders soon [19:57:56] Ironholds: if it works i will let you know [19:58:08] nuria, so, should we include the os's and browser's major version? [19:58:14] Ironholds: will document it and yarayara [19:58:39] mforns: let me see again teh output of ua-parser [19:58:54] nuria, there's an example in the task [19:59:05] https://phabricator.wikimedia.org/T88504 [19:59:33] mforns: aham [19:59:34] the thing is, like this, the long tail is quite long: 3600 records [19:59:55] mforns: yes, that is too be expected with mobile fragmentation [20:00:16] ok, so I'll keep it like this [20:00:32] so "browser_family" "os_family" "browser_version" "os_version" [20:00:49] browser_major that is [20:01:03] https://www.irccloud.com/pastebin/kTAGiA2Z/ [20:01:33] mforns: in the case above [20:01:55] we will not be reporting device or os-minor or app-version [20:01:57] nuria, cool! [20:02:10] ok [20:02:10] we might need to add os-minor later for mobile [20:02:54] mforns: i would not include mobile app [20:03:15] mforns: cause in that case what matters are different fields [20:03:26] mforns: want me to comment on task? [20:03:28] nuria, aha [20:03:48] nuria, no it's not needed [20:04:27] mforns: in this case traffic is tiny compared to desktop but it is really a different report in which browser versions are not so meningful without app version [20:04:31] *meaningful [20:05:02] nuria, aha [20:05:32] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [20:06:01] Analytics-Cluster, Analytics-Kanban, Easy: PM sees reports on browsers (Weekly or Daily) [8 pts] - https://phabricator.wikimedia.org/T88504#1729157 (Nuria) Two changes: 1) let's not include mobile app, just mobile web on the mobile report and mobile+desktop in the overall. 2) let's report percentages [20:06:09] mforns: also percentages better than views [20:06:21] mforns: as otherwise devs are going to have to calculate those [20:06:40] nuria, is this a report to be read by humans then? [20:06:44] mforns: it is to be understood that (because we are throwing away the longtail) it will not add up to 100% [20:06:51] mforns: yes, by devs [20:07:12] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [20:07:23] nuria, oh, we are throwing away the long tail? so where is the threshold? [20:07:48] mforns: so far i have used less than 0.5% [20:07:55] mforns: let me verify [20:08:10] if the report is to be read by humans, better to concatenate all the os and browser info [20:08:21] ok [20:08:34] well, it will be a comma (or tab) separated file [20:09:09] leaving os and browser separated allows for easier pivits in excell if someone wants to import it, this is what they do now [20:09:19] oh I see [20:09:27] Krinkle is offline but he'll get messaged by the ticket [20:09:32] cool [20:12:28] mforns: i think we did 0.05% [20:12:36] nuria, ok [20:12:40] will do [20:12:41] mforns: but let's run the reports and see [20:12:49] ok [20:57:10] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1729350 (jgbarah) >>! In T89135#1720801, @Aklapper wrote: >>>! In T89135#1720707, @ashitaprasad wrote: >> Can you please guide me to gain some traction and contribute to this p... [21:02:02] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1729370 (jgbarah) > Yes, if I remove print (pageid+" "+ namespace + " " + title+" was already in the db") and replace it with print (pageid+" was already in the db"), I don't get this problem. T... [21:04:40] mforns: let me know if you run into oozie trouble [21:04:46] nuria, sure [21:05:40] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1729377 (jgbarah) >>! In T114437#1728837, @Anmolkalia wrote: > Hi, @jgbarah, I am getting that error if I run the code again, basically when the database already contains the values that are bei... [21:05:53] I was going to ask madhuvishy because we combined this way in standup, but I'm still improving the query, when I come to oozie, I'll ask either of you, thx [21:06:02] nuria, ^ [21:06:20] mforns: sure! [21:06:48] Analytics-Tech-community-metrics, Possible-Tech-Projects: Improving MediaWikiAnalysis - https://phabricator.wikimedia.org/T89135#1729384 (jgbarah) >>! In T89135#1714299, @Fhocutt wrote: > This looks interesting, and I can help with exploring and working with the MediaWiki API. Please, go ahead! [21:07:04] mforns, madhuvishy we should probably have a generic workflow for the browser report as in "run this query and dump to a file" [21:07:19] yeah [21:07:34] we are gonna append to the end of a file right? [21:07:35] mforns, madhuvishy w/o the parametization that teh other waorkflows have [21:07:55] mmhm [21:08:03] nuria: boy I wish I hadn't deleted those meetings. This was quite an outage :/ [21:08:04] nuria: it will still need time params no [21:08:08] madhuvishy: if i remember appending was hard, it was easier to "create" a weekly file [21:08:23] data's randomly missing from the whole time the consumer was down [21:08:25] madhuvishy: yes nbut only those [21:08:28] *but [21:08:37] milimetric: niceee [21:08:47] milimetric, mmm :/ [21:08:48] mforns: oh! [21:08:52] madhuvishy: do you know how long data's going to stick around in that kafka topic? The mixed one that mysql consumer consumes? [21:08:54] madhuvishy, ? [21:08:59] 7 days [21:09:08] milimetric: but how...could it be "randomly" missing? [21:09:14] not sure [21:09:20] uhhh mforns sorry that was supposed to be for milimetric [21:09:23] that can only be explained if the duplicate stuff isnot deployed [21:09:26] it's weird, I looked at 3 different schemas and they're all the same [21:09:30] :] [21:09:39] so some of the inserts are failing [21:09:48] like you were saying yesterday, yea [21:09:57] i'll check puppet [21:10:05] aha makes sense [21:10:17] milimetric: you can check the code, on /srv let me see what machine was this deployed to [21:10:31] nuria: eventlog1001 [21:10:34] nah, replace=true [21:10:34] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/eventlogging.pp#L233 [21:10:45] eventlog1001.eqiad.wmnet, yep [21:11:32] yep, checked the config on el1001, it's there, replace=True [21:11:34] weeeeird [21:11:58] milimetric: indeed [21:12:13] andrew and I tested the replace=True very carefully, it did what we figured it would [21:12:29] milimetric, I have another theory [21:12:32] k :) [21:13:00] mforns: let me guess [21:13:12] underpants gnomes! [21:13:14] mforns: consumer is starting->inserting [21:13:23] blowing up with oom [21:13:26] the thread that was consuming from kafka and queueing for insertion was faster than the thread actually inserting [21:13:29] starting->inserting again [21:13:40] right, causing uooms [21:13:58] i'll save the log from 10.14 so we can check that [21:14:17] mmmmm, I was thinking that the queue got too big and the consumer restarted several times, loosing what was in the queues [21:14:43] mforns: which queue is this? [21:14:54] well if it is too big it'll die with oom, right? [21:15:05] k, saved logs to /home/milimetric on el1001 [21:15:06] madhuvishy, the queue where all events get queued for sql insertion [21:15:15] mm hmm [21:15:26] nuria, yes! I was figuring out what oom was :] [21:17:21] mforns: let's check the kern log [21:17:34] mforns, milimetric : what utc time was teh outage? [21:17:52] nuria, yes, the kern log right [21:17:55] 06:48 UTC to 17:03 UTC [21:18:03] yesterday 14th? [21:18:05] that's how long we see insertAttempted = 0 [21:18:29] however, after 17:03 it started attempting a lot of inserts, but never ended up inserting enough to catch up [21:18:39] if the theory is true, the ooms should start at +-15:30 UTC [21:19:00] Oct 14 15:52:25 eventlog1001 kernel: [19872077.612410] eventlogging-pr invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [21:19:11] this is the processor [21:19:12] ta -ta-channnn [21:19:15] ay ay a [21:19:16] sorry, 15:03, sorry, not 17:03 [21:19:23] it should be eventlogging-co [21:19:33] right -co [21:19:58] Oct 14 15:52:25 eventlog1001 kernel: [19872077.613003] Out of memory: Kill process 10512 (eventlogging-co) score 867 or sacrifice child [21:19:59] interesting though..... processor died?! [21:20:16] there you go [21:20:23] https://www.irccloud.com/pastebin/NfWZhiHK/ [21:21:04] yeah, I was afraid that would happen, I wanted to restart the consumer too but figured it couldn't keep up [21:21:11] to fully confirm the theory we should have several consumer kills [21:21:17] during a couple hours [21:21:31] well, this one death could've killed a TON of events [21:21:45] so we don't necessarily need multiple deaths [21:22:06] for it to OOM that big machine with just text :) [21:22:39] the offsets probably got committed to kafka though and the messages were never re inserted [21:22:42] anyway, we have to grab the kafka topic, and re-consume from the offset before 06:00 [21:22:51] has anyone thought about how to do that yet ^? [21:22:56] I see, but wouldn't just one kill, create a whole in the db? Kafka delivers events in order, right? [21:23:05] like can I start another consumer with a specific offset? [21:23:36] hmmm, i dont think you can start with specific offset [21:23:44] mmm [21:23:59] mforns: yeah, you're right, some data trickles in. hm... maybe that's from the other thread that inserts what's left in the queue after the worker dies? [21:24:12] we should also, modify the eventlogging mysql consumer code to block the kafka reader thread if the queue is too big [21:24:34] yeah, that makes sense [21:24:48] mforns: bounded blocking queues [21:24:57] madhuvishy, aha [21:26:31] i wanna try to finish that restbase work, if we have this data for 7 days I'd like to wait until Andrew is around [21:26:43] milimetric: you can manually pipe the events into the consumer? [21:26:51] madhuvishy, maybe for now, would be easier to just add a sleep(1) in the kafka-reader thread if the queue is too large [21:26:54] like, we can use kafkacat [21:27:04] and read from that using specific offset [21:27:05] that's a good idea, like backfillinf [21:27:05] oh... and consume from stdin? [21:27:09] and pipe to consumer [21:27:21] yeah, kafkacat will let you specify offset [21:27:30] that makes sense [21:27:46] our kafka reader handler doesn't do that as far as i know, there's just latest and earliest [21:28:14] right. I'm not sure how to use kafkacat, investigating [21:28:26] kafkacat -o -5 -t eventlogging_Blah -b kafka1012:9092 [21:28:42] milimetric: ^ [21:28:44] hm, but it's not on el1001 [21:28:55] hmmm [21:29:02] stat1002 has it [21:29:19] right, but it doesn't have access to the mysql machines the consumer needs [21:29:22] yup [21:30:12] milimetric: where are the mysql machines? [21:30:23] stat1003? [21:30:44] no, i'm not sure how they're set up but they're not reachable from the stat machines [21:31:03] they're called m4-master.eqiad [21:31:11] hmmm [21:31:37] conspicuously completely absent from puppet: https://github.com/wikimedia/operations-puppet/search?utf8=%E2%9C%93&q=m4-master [21:31:58] secret machines! [21:32:00] data holes! [21:32:04] our life is fun :) [21:32:05] xD [21:32:31] milimetric: hmmm [21:33:02] you know, though, we could kafkacat from kafka and just save this period of events to a file [21:33:11] then we can copy that manually [21:33:17] we don't have to pipe everything all fancey [21:33:24] yeah if you want to be safe for now [21:34:03] and then we can pipe the file to consumer [21:34:16] yea, but then it would have the same problem [21:34:20] aha, I think we have around 702 MB per hour uncompressed [21:34:21] we need a sleep like mforns said [21:34:57] yeah [21:35:13] milimetric, if you want I can do that [21:35:54] mforns: i'd love to test it on one of those test ana machines [21:36:07] do we still have an1004 or was it repurposed? [21:36:25] no idea [21:38:20] mforns: yeah, if you want you can set that up, I'll try to kafkacat this stuff out [21:38:25] ok [21:45:42] madhuvishy: this offset parameter seems random :) [21:45:56] milimetric: why? [21:46:01] I'm trying all kinds of values for -o and the timestamps of the events are all within a few minutes of now [21:46:16] everything from 1 to -5000000 [21:46:33] milimetric: hmmm [21:47:08] but I can do -o beginning and that gets me 7 days ago.... [21:47:33] milimetric: interesting [21:47:55] i have never tried more than last 10 messages and it's worked fine for that [21:48:07] there's an option called -c [21:48:13] to limit count of messages [21:48:21]