93 lines
1.9 KiB
Markdown
93 lines
1.9 KiB
Markdown
RosettaCode Data Project
|
|
========================
|
|
|
|
This git repository contains (almost) all of the code samples available on
|
|
http://rosettacode.org organized by Language and Task.
|
|
|
|
|
|
## Getting the Data
|
|
|
|
All of the data is in this repository, so you can just run:
|
|
|
|
git clone https://github.com/acmeism/RosettaCodeData
|
|
|
|
*However...*
|
|
|
|
It's a lot of data!
|
|
|
|
If you just want the latest data, the quickest thing to do is:
|
|
|
|
git clone https://github.com/acmeism/RosettaCodeData --single-branch --depth=1
|
|
|
|
|
|
## Tools
|
|
|
|
This repository's data content is created by a Perl program called
|
|
`rosettacode`.
|
|
|
|
You can install it with this command:
|
|
|
|
cpanm RosettaCode
|
|
|
|
You can rebuild the data with:
|
|
|
|
make build
|
|
|
|
|
|
This repository has a `bin` directory with various tools for working with the
|
|
data.
|
|
|
|
* `rcd-api-list-all-langs`
|
|
|
|
List all the programming language names directly from rosettacode.org
|
|
|
|
* `rcd-api-list-all-tasks`
|
|
|
|
List all the programming task names directly from rosettacode.org
|
|
|
|
* `rcd-new-langs`
|
|
|
|
List the RosettaCode languages not yet add to Conf
|
|
|
|
* `rcd-new-tasks`
|
|
|
|
List the RosettaCode tasks not yet add to Conf
|
|
|
|
* `rcd-samples-per-lang`
|
|
|
|
Show the number of code samples per language
|
|
|
|
* `rcd-samples-per-task`
|
|
|
|
Show the number of code samples per task
|
|
|
|
* `rcd-tasks-per-lang`
|
|
|
|
Show the number of tasks with code samples per language
|
|
|
|
* `rcd-langs-per-task`
|
|
|
|
Show the number of languages with code samples per task
|
|
|
|
|
|
## To Do
|
|
|
|
Pull requests welcome!
|
|
|
|
This project is not a perfect representation of RosettaCode yet.
|
|
It has a few uncicode issues.
|
|
It also has to deal with various formatting mistakes in the mediawiki source
|
|
pages.
|
|
|
|
* Fix bugs
|
|
|
|
* Correct the 100s of guessed file extensions in `Conf/lang.yaml`
|
|
|
|
* Ability to only fetch cache pages since last pushed data update
|
|
|
|
* Support names with non-ascii characters
|
|
|
|
* Add more bin tools
|
|
|
|
* Address errors reported in rosettacode.log after running `make build`
|