Puts the "engineering" back into data science because it borrows concepts from software engineering and applies them to machine-learning code. It is the foundation for clean, data science code.
Kedro's pipeline visualisation plugin shows a blueprint of your developing data and machine-learning workflows, provides data lineage, keeps track of machine-learning experiments and makes it easier to collaborate with business stakeholders.
A series of lightweight data connectors used to save and load data across many different file formats and file systems. Supported file formats include Pandas, Spark, Dask, NetworkX, Pickle, Plotly, Matplotlib and many more. The Data Catalog supports S3, GCP, Azure, sFTP, DBFS and local filesystems. The Data Catalog also includes data and model snapshots for file-based systems.
Apache Spark, Pandas, Dask, Matplotlib, Plotly, fsspec, Apache Airflow, Jupyter Notebook and Docker.
You can standardise how configuration, source code, tests, documentation, and notebooks are organised with an adaptable, easy-to-use project template. Create your cookie cutter project templates with Starters.
You can find the Kedro community on Slack.
We also maintain a list of extensions, plugins, articles, podcasts, talks, and Kedro showcase projects in the awesome-kedro repository.
Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro standardises how data science code is created to ensure it is reproducible, maintainable, and modular; it uses software engineering best practices to help you build production-ready data science code.