RosettaCodeData/ReadMe.md

93 lines
1.9 KiB
Markdown

RosettaCode Data Project
========================
This git repository contains (almost) all of the code samples available on
http://rosettacode.org organized by Language and Task.
## Getting the Data
All of the data is in this repository, so you can just run:
git clone https://github.com/acmeism/RosettaCodeData
*However...*
It's a lot of data!
If you just want the latest data, the quickest thing to do is:
git clone https://github.com/acmeism/RosettaCodeData --single-branch --depth=1
## Tools
This repository's data content is created by a Perl program called
`rosettacode`.
You can install it with this command:
cpanm RosettaCode
You can rebuild the data with:
make build
This repository has a `bin` directory with various tools for working with the
data.
* `rcd-api-list-all-langs`
List all the programming language names directly from rosettacode.org
* `rcd-api-list-all-tasks`
List all the programming task names directly from rosettacode.org
* `rcd-new-langs`
List the RosettaCode languages not yet add to Conf
* `rcd-new-tasks`
List the RosettaCode tasks not yet add to Conf
* `rcd-samples-per-lang`
Show the number of code samples per language
* `rcd-samples-per-task`
Show the number of code samples per task
* `rcd-tasks-per-lang`
Show the number of tasks with code samples per language
* `rcd-langs-per-task`
Show the number of languages with code samples per task
## To Do
Pull requests welcome!
This project is not a perfect representation of RosettaCode yet.
It has a few uncicode issues.
It also has to deal with various formatting mistakes in the mediawiki source
pages.
* Fix bugs
* Correct the 100s of guessed file extensions in `Conf/lang.yaml`
* Ability to only fetch cache pages since last pushed data update
* Support names with non-ascii characters
* Add more bin tools
* Address errors reported in rosettacode.log after running `make build`