Google Summer of Code Phase-1 | Hydra Ecosystem
Hey Everyone,
Here is the windup to the first month of GSoC’2021 as the first evaluation has started on 12 July 2021. It has been an amazing journey developing a project from scratch and maintaining the repository. This blog will consist of the work I have done in GSoC’21 phase-1.
My project aimed to develop a Hydra-powered API for credit-risk management, by serving the loan portfolio data to various authorities, investors, regulators, etc.
Hydra is a documentation framework that leverages the potential of SemanticWeb and LinkedData to build next-generation APIs.
For more information about Hydra please visit https://www.hydraecosystem.org/
LinkedData (SemanticWeb) provides software programs with machine-interpretable metadata of the published information and data.
We started with creating a simple POC with the following tasks:
1) Creating NonPerfromingLoan JSON-LD Vocab.
2) Adding the NPL vocab to ApiDoc’s context.
3) Creating classes & properties & setting up the database.
Here is the flow of how all these things work:
- JSON-LD vocabulary is a subset of NonPerofmingLoan ontology, which will be used to reference all the required classes & properties in the ApiDoc.
{
"@context": { "NPL": 'Vocab_url'},
"Loan": "NPL:Loan",
"Borrower": "NPL:Borrower"
}
You can find more details regarding JSON-LD “@context” here.
2. ApiDoc will be created using hydra-python-core module doc_writer.py
3. The database will set up automatically by parsing the ApiDoc. The table names will be the hydra Classes in the API Doc & their attributes will be the supported properties of the class.
4. Finally by setting up the ApiDoc, Auth, Database & Session we can start the hydrus server. ( setting up & running the main.py file)
We completed these tasks in the first two weeks and we were good to go with the simple working POC.
For more details regarding the work on these tasks please refer to my previous blog .
Week 3–4
Moving forward I have restructured the repository, added tests and functionality doc for the API.
I have also developed Mock_portfolio_generator which is basically used as a mock_client to populate the database with the more realistic data.
We had 3 simple classes connected with the foreign key in the following pattern:
We needed to extend our classes & properties to create more realistic portfolio data, for that we extended our JSON-LD vocab and ApiDoc, but doing it manually was a messy task. To avoid mistakes and future convenience we decided to automate the creation of ApiDoc.
After searching through the past works and exploring I proposed to do that automation in below mentioned two steps:
1. Creating JSON-LD vocab from owl ontology.
2. Creating ApiDoc with the help of nplvocab_parser.
To solve these two tasks I developed below two python scripts that fulfill the purpose of automation.
Vocab_generator:
vocab_generator generates “NonPerformingLoan.jsonld” using rdflib and pyld libraries to parse & serialize owl ontology to JSON-LD with the @context.
To generate NPL vocabulary:
python npl_vocab/vocab_generator.py
nplvocab_parser
nplvocab_parse parses all the classes & properties from“NonPerformingLoan.jsonld” and converts them to HydraClass & HydraClassProp.
It can be used by importing as a python module:
import NPLVocab_parse as parser
npl_vocab = parser.get_npl_vocab()
classes = parser.get_all_classes(npl_vocab)
hydra_classes = parser.create_hydra_classes(classes)
Now the complete automation flow is:
While working with the creditrisk-poc we found an issue that the class attributes (columns) in the database are of type VARCHAR by default for every property in hydrus. But Open Risk suggested that it's better to have attributes of a similar type as in the hydra property, after a good discussion with the mentors, Hasan Faraz Khan implemented this feature in the hydrus and hydra-python-core, and then finally I was able to use it in creditrisk-poc successfully.
Here are the issues & PRs for the work done in 3–4Weeks:
https://github.com/HTTP-APIs/creditrisk-poc/issues/15
https://github.com/HTTP-APIs/creditrisk-poc/issues/16
https://github.com/HTTP-APIs/creditrisk-poc/issues/18
https://github.com/HTTP-APIs/creditrisk-poc/issues/20
https://github.com/HTTP-APIs/creditrisk-poc/pull/17
https://github.com/HTTP-APIs/creditrisk-poc/pull/19
https://github.com/HTTP-APIs/creditrisk-poc/pull/21
https://github.com/HTTP-APIs/creditrisk-poc/pull/22
https://github.com/HTTP-APIs/creditrisk-poc/pull/23
With the completion of the automation task, We have released the work done in Google Summer of Code 2021 phase-1.
Have a look at the release and key features here.
I have learned a lot of new things at each and every step as we moved ahead, also I am grateful to all the mentors for their significant guidance. Next month will be focused on enhancing the project & deploying it to Google Cloud Platform.