Draft:Feature Store
Submission declined on 18 March 2024 by ToadetteEdit (talk).
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
|
Submission declined on 21 January 2024 by Theroadislong (talk). This draft's references do not show that the subject qualifies for a Wikipedia article. In summary, the draft needs multiple published sources that are: Declined by Theroadislong 10 months ago.
|
A Feature Store is a centralized repository for storing, managing, and serving machine learning features. It enables data scientists and machine learning engineers to share and reuse features, ensuring consistency between model training and production environments. Feature stores aim to simplify the feature engineering process, reduce duplication of effort, and accelerate the deployment of machine learning models.
History
[edit]As organizations scaled their machine learning efforts, they encountered challenges in managing and operationalizing features across multiple teams and projects. Companies like Uber and Airbnb developed internal feature store platforms to address these challenges.[1][2] The concept gained traction, leading to the development of open-source and commercial feature store solutions.
Functionality
[edit]Feature stores provide several key functionalities:
Feature Engineering: Tools and pipelines for creating and transforming raw data into features suitable for machine learning models.
Feature Storage: A centralized repository that stores features in both offline (batch) and online (real-time) modes.
Feature Serving: Mechanisms to serve features consistently for both training and inference, ensuring data consistency.
Feature Governance: Management of feature metadata, versioning, access control, and lineage tracking.
Feature Monitoring: Tools to monitor feature drift, data quality, and performance over time.
Benefits
[edit]Implementing a feature store offers several benefits:
Consistency: Ensures that the same feature definitions are used during both training and inference.
Reusability: Promotes reuse of features across different models and teams, reducing duplication.
Efficiency: Accelerates model development by providing readily available features.
Collaboration: Facilitates collaboration among data scientists and engineers through shared feature repositories.
Scalability: Supports large-scale machine learning applications by handling vast amounts of data efficiently.
Implementations
[edit]Various open-source and commercial feature store platforms are available:
Feast: An open-source feature store developed by Gojek and contributed to by Google Cloud.[3]
Hopsworks: An open-source platform by Logical Clocks offering a feature store with support for Apache Spark and TensorFlow.[4]
Databricks Feature Store: A feature store integrated with the Databricks Lakehouse Platform.[5]
Amazon SageMaker Feature Store: A fully managed feature store as part of Amazon SageMaker.[6]
Challenges
[edit]Despite the benefits, feature stores present challenges:
Integration: Integrating with existing data infrastructure and pipelines can be complex.
Data Quality: Ensuring high-quality, consistent data requires robust validation mechanisms.
Latency: Serving features in real-time with low latency is technically challenging.
Security and Compliance: Managing access control and compliance with regulations like GDPR.
See Also
[edit]References
[edit]- ^ Zhang, Zheng (2018-09-05). "How Uber Engineering Increases ML Development Velocity with Michelangelo Palette". Uber Engineering Blog. Retrieved 2023-10-15.
- ^ "Airbnb's Bighead: A Feature Store for ML Pipelines". Medium. 2019-10-15. Retrieved 2023-10-15.
- ^ Nguyen, Kai; Chan, David (2020). "Feast: A Feature Store for Machine Learning". Proceedings of the 2020 IEEE International Conference on Data Engineering: 1801–1812.
- ^ Moradi, M.; Wider, J.; Papapanagiotou, I. (2019). "Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata". Proceedings of the 2019 IEEE International Conference on Big Data: 5787–5794.
- ^ "Simplify ML Pipelines with the Databricks Feature Store". Databricks Blog. 2021-04-06. Retrieved 2023-10-15.
- ^ "Introducing Amazon SageMaker Feature Store". AWS Machine Learning Blog. 2020-12-01. Retrieved 2023-10-15.
Citations: [1] https://aws.amazon.com/sagemaker/feature-store/ [2] https://feast.dev [3] https://aimresearch.co/market-industry/how-ubers-predictive-machine-learning-is-changing-user-experience [4] https://twimlai.com/solutions/machine-learning-platform-case-studies/ [5] https://github.com/iamirmasoud/feast-tutorial [6] https://www.hopsworks.ai/the-python-centric-feature-store [7] https://www.uber.com/blog/from-predictive-to-generative-ai/ [8] https://cloud.google.com/blog/products/databases/how-feast-feature-store-streamlines-ml-development?hl=en
- in-depth (not just passing mentions about the subject)
- reliable
- secondary
- independent of the subject
Make sure you add references that meet these criteria before resubmitting. Learn about mistakes to avoid when addressing this issue. If no additional references exist, the subject is not suitable for Wikipedia.