A Dollar Worth of Ideas

If ideas are a dime a dozen, I will be posting here a dollar worth of ideas.

These are possible open source, research or data science projects or contributions for people to pursue.

I would be interested in mentoring some of them.

If any of them inspire you, a citation or a backlink to it will be nice. I can link to your work from the page, too. Just contact me for details.

1. Dog Fooding Data Science: mining skill requirements for data scientists in job boards. { data science }
2. Free Software Guilds: group together people interested in software creation. { floss community }
3. Port DECORATE to Scikit-Learn: synthetic training data for instance engineering. { ml floss }
4. VESRE: poorly configured indexes can reveal private secrets. { floss security }
5. Build an Open Source Version of JuruXML: bring back semantic search. { floss ir }
6. Social Network Game: sell sausages, be popular. { game community }
7. Android Reverse Proxy with DNS Intercept: sandbox your apps. { floss android }
8. Automated Reverse Engineering Documentation: help to clean-room reverse engineering. { floss nlg }
9. Overfitting the Web: is training on the whole web akin to overfitting?. { research nlp }
10. LibreOfficeNLG: add NLG to Libre Office mail merge. { floss nlg }
11. README Commentator: correlate project readme with number of stars. { floss research nlp }
12. Automated Calligraphic Art: write it with words. { floss art }
13. Pretrained Models for Clustering: make pretrained models for common distance metrics. { data science }
14. PD NN: visually training neural networks using Pure Data. { floss art }
15. TensorFlow as a Feature Engineering DSL: reuse the computation graph. { data science }
16. Structured Code Reviews using NLG: make reviews faster and less toxic. { nlg floss }
17. Information Extraction for WikiSpore Events: seed a MediaWiki incubation project using NLP. { data science }
18. Getting the Word Out for Word2Vec: embeddings for authors. { nlp outreach }
19. Replacing Festival Embedded Scheme with Python: text-to-speech for decades to come. { floss nlp }
20. Causal Modeling for Social Media Self Selected Samples: making sense of sentiment data. { data science }
21. NLG4SHAP: making sense of features using text generation.{ nlg ml }
22. Visual Elm: attracting visual thinkers to front-end dev. { floss dev }
23. Hidden Agendas: a semi-cooperative game with a strong narrative component. { game design }
24. Repro Finder: automatically determine which papers needs to be reproduced. { research community }
25. Grammatical Framework for No-Code AI: using Haskell's GF for AI applications. { data science nlp }
26. Private Counter: the simplest homomorphic encryption app. { floss security }
27. Rubix loves Monica: machine learning in PHP to keep in contact with friends. { data science floss }
28. Source Code Semantic Search: use XML fragments to index programs. { software engineering ir }
29. MIDI Shake Game: follow the rhythm, enjoy the music. { game design }
30. Enhancing Biases in ML Models: make models more ugly to better understand them. { data science }
31. Adding Privacy to the Debian Social Contract: privacy is a human right. { debian }
32. Translate Building Synthetic Voices: help people building barebones text-to-speech for rare languages. { floss nlp }
33. You Need Data For That: build a data recommendation system from published papers. { data science }
34. Semantic Chorded IDE: programming with subtrees. { programming ui }
35. Bring Useful Emails Back: scrap fb/linkedin/etc and send email. { floss }
36. All Our Ideas Everywhere: reuse All Our Ideas on TW/FB/etc. { data science floss }
37. Two Phones: enhanced privacy separating apps from telephone service. { floss privacy }
38. Fingerprint NNs: detect copies of trained models. { data science }
39. Haptic Belt Morse Interface: learn morse, receive messages. { maker }
40. Fancy Toilets Revisited: find adjective distributions in quoted text. { data science nlp }
41. Porting Out of RoR: moving from Ruby to Python or PHP. { software engineering }
42. Serious Games 50: write open source versions of Abt (1950) book. { game floss }
43. Farmer Text Support Revisited: Community QAs for everybody. { data science floss }
44. Pip Install UIMA: package UIMA reusing some of the pyspark tooling. { nlp floss }
45. PHP Elm: explore similarities in the load-execute-serve model of programming. { programming }
46. BERT5: reproduce results changing the language model. { data science }
47. Lojban for microwave ovens: creoles for non-human intelligences. { nlp linguistics }
48. Deploy Auto Bug Assigner: bring some SotA software engineering research solutions to community use. { software engineering floss }
49. t-SNE for All: dimensionality reduction for non-data scientists. { data science floss }
50. Timeless Intelligence: discovering intelligence that is either too fast or too small. { artificial intelligence }
51. IPCC Full Report Summarization: help make sense of the 2,216 page report from 2014. { nlp summarization }
52. Reproduce Crowdsourced Annotations: test instructions and agreement. { data science }
53. Pure Java Implementation of nd4j: make deeplearning4j run wherver a JVM is available. { ml floss }
54. Program Wiki: a program that anybody can edit, bring the Tiki Way to JS. { floss }
55. Feature Discretization Library: going beyond k-bins. { data science }
56. Priming Catalog: find known priming effects and structure them. { cognitive engineering }
57. ThoughtTreasure User Simulator: transfer the work put into ThoughTreasure into a User Simulator for dialogue systems. { nlp floss }
58. Manual Shell: man pages that read themselves to you as you mind your bash business. { qa linux }
59. Haptic 4D UI: visualize the shadow of 4D objects with the sense of touch. { maker }
60. Surrogate Splits Everywhere: handle imputation at decision tree construction. { machine learning }
61. Lobjan Bible: translate the bible to Lobjan semi-automatically. { nlp }
62. Solve Your Own Debian Bug: tools to help Debian users tackle their own bugs. { debian }
63. MXNet UIMA: integrate Apache MXNet deep learning into Apache UIMA. { nlp floss }
64. Fake It Till Your ML Makes It: a human-behind-the-curtain ML framework. { floss machine learning }
65. SQLite Index Recommender: speed up queries for a given DB. { floss }
66. UX testing using RL: test a UI using an AI trained on good UIs. { machine learning }
67. Multigenerational Software: tools, languages and stacks for extreme backwards compatibility. { programming }
68. Learn To Search Community: a forum where questions are answered through keyword searches. { community ir }
69. Contextual Search Using Screen Readers: Searching without queries, a recommendation approach to desktop search. { qa ir }
70. Make Clean Concurrent Again: transform the backend of a lazy functional language for parallelism. { programming }
71. Hi-Fi Streaming for Musicians: jamming with predictive error correction. { audio ai }
72. Chanted DNA: transform DNA into words that can be recited. { nlg art }
73. Open Source Papers with Code Alternative: extract and archive source code links from papers. { nlp floss }
74. Algorithmic Neighborhoods: a world shrunk by recommendation algorithms. { community research }
75. Good Deeds Challenge: a 24 pull requests for the real world. { community }
76. Coprogramming Language: a programming language for partnering with the computer to solve problems. { programming ai }
77. DAW-on-GPU: a digital audio workstation accelerated using a GPU. { audio floss }
78. Cultural Differences as Style Transfer: translating not only language but culture. { nlp research }
79. Bibliphobic Writers: find out why there seem to be more writers than readers. { community }
80. ML Implementation Differences: study the differences in behavior for different implementations of the same ML algorithms. { ml }
81. Résumé Screening Simulator: help job seekers understand why a clear, short resume makes the difference. { nlg community }
82. Twitter Word2Vec Dictionary: find new expressions as they get created. { nlp research }
83. When to Fold Them: opportunity cost analysis. { ai }
84. Educational Trading Bots: use trading bots to teach ML and finance. { ml education }
85. Chat-aware Streamers Music Generation: generate music for streamers that reacts to the sentiment in the chat. { ai audio }
86. CV Keyword Stuffing: jump through automatic gatekeepers using microlines. { nlp community }
87. Reddit Connection Search: find relationships between two events or topics through Reddit posts. { nlp ir }
88. Neural Image Compressor: pack autoencoders for use with web images. { floss ai }
89. Transferred Conectome: use a worm neural structure to do something new. { research ai bio }
90. Annotation Instructions Library: collect annotations instructions for scientific reproducibility. { research community }