Head of data – Job

Job Description

Because SaaS does not satisfy most of specific needs, we need to market new kind of CDP to empower data management. This is

Requirements:

  • Experience with ETL, data pipelines.
  • Knowledge of SQL
  • Knowledge of GenAI, LLMs, a bit of MLOps skills to deploy LLMs.
  • At least basic: Python, Javascript
  • English leve – B1+
  • Experience with Docker, Git-actions, Gitflow, Terraform, Terraform-cloud
  • Ability to grasp new concepts fast.

We can consider someone junior, but you really should have at least academic experience with the technologies mentioned above.

What you’ll get:

  • Pleasant atmosphere for personal and professional growth
  • Good salary and flexible hours
  • Employees Stock Options Program
  • Flexible hours
  • Fun when working and responsible attitude

What’s next for Cyber Whale

SaaS

By 2026, more than 80% of organizations will actively utilize GenAI models and APIs and/or applications with GenAI, compared to less than 5% at the beginning of 2023. Recently, there has been a rapidly growing organizational structure for AI management, as discussed below.

According to Gartner, the key IT trends for 2024 and beyond are:

Democratized Generative AI:

Democratized Generative AI aims to make AI technology accessible to a wider range of users, including non-specialists and individuals without extensive technical knowledge.

AI Trust, Risk, and Security Management (AI TRiSM):

AI TRiSM is a market segment for AI management products and services, including AI audit and monitoring tools, as well as management frameworks.

By 2026, organizations using AI TRiSM tools are forecasted to increase the accuracy of decision-making by filtering out up to 80% of irrelevant information.

Continuous Threat Exposure Management (CTEM):

CTEM is a cybersecurity process that uses attack simulations to identify and mitigate threats to an organization’s networks and systems.

By 2026, the widespread adoption of CTEM could improve enterprise cybersecurity levels by threefold.

Sustainable Technology:

Sustainable technologies are innovations that consider natural resources and contribute to economic and social development, aiming to significantly reduce environmental risks.

In the coming years, there is an expected increase in the reliance on sustainable technologies, impacting the salaries of IT directors based on their readiness to use these technologies.

Platform Engineering:

Platform engineering is a technology approach that accelerates application delivery and enhances business value by providing infrastructure service automation.

AI-Augmented Development:

AI-Augmented Development refers to a set of tools and platforms for developing applications using AI, enabling developers to create applications more efficiently, quickly, and reliably.

Industry Cloud Platforms:

These platforms achieve specific industry business outcomes by integrating existing SaaS, PaaS, and IaaS services into a comprehensive offering with composable capabilities.

By 2027, the popularity of Industry Cloud Platforms within organizations is forecasted to increase fivefold compared to 2023.

Intelligent Applications and Augmented Connected Workforce:

Intelligent applications accelerate and automate work processes, sometimes replacing low-skilled or insufficiently skilled workers.

By the end of 2028, 25% of IT directors are expected to use ACWF strategies, accelerating the competency growth of subordinates by 50%.

Machine Customers:

Machine customers refer to machines that replace real human customers to perform tasks such as automated ordering or purchasing.

By 2030, the forecast predicts significant growth in this industry, potentially surpassing the revenue of digital commerce.

Among other trends:

  • Augmented Reality (AR) technologies are expected to experience a breakthrough from 2025.
  • Continued development and integration of metaverses, including the use of headsets and augmented reality.
  • Continued growth in SaaS with potential breakthroughs.
  • Development and support of LLM models.
  • Advancements in Quantum Computing, although the efficient future is still quite distant.
  • Internet of Things (IoT) – communication between multiple devices for coordinated operation without human intervention.
  • Remote learning (EdTech).
  • Control and cybersecurity of Big Data.
  • Cross-platform UI, Compose Multiplatform.
  • Continued development of native technologies – Swift, Kotlin, Aurora OS, ROSA.
  • Neurointerfaces requiring AI development.
  • Cloud technologies – continued expansion and demand for cloud computing specialists, data analysts, and cloud engineers.
  • Digital marketing – focus on SEO, transparency, and influencer marketing, considering Google’s Privacy Sandbox and cookie abandonment.
  • Product managers may become more popular due to future advertising restrictions, requiring products to immediately attract attention.
  • Growth and complexity of AI in smart homes, autopilots, and drones, increasing demand for data engineers, ML, AI.
  • Development of automated hiring systems in HR.
  • Growing popularity of DevOps for accelerated development processes.
  • Emergence of the prompt engineer profession.
  • Data communicator and storyteller – a subset of data analytics that may become popular, translating and presenting data in easily understandable packages.
  • UX and UI designers, especially with the rise of low-code, will continue to be popular, making software intuitive, organic, and easily manageable.

Code of conduct at Cyber Whale


A. Basic Rules of Work Ethics

  1. To work at Cyber Whale, it is essential to consider compliance and adhere to necessary norms when dealing with colleagues and clients. Diligent, timely, and clear adherence to client and tech lead preferences ensures the prompt achievement of results with minimal revisions.
  2. Crucial strategies and decisions essential for the company’s operation (in technical, ethical, financial, and organizational terms) are not discussed with clients without notifying and involving the managers.
  3. Every employee in our company can be confident that they will be evaluated solely based on their professional qualities. We stand against discrimination on any grounds and appreciate the individuality, personal stance, and cultural characteristics of each colleague. In case of any observed discrimination within the team, we take immediate measures to protect the rights of the colleague facing discrimination.

B. Confidentiality, Privacy, and Transparency

  1. The company’s policy emphasizes complete transparency and honest feedback with our clients, as well as the clients themselves and employees engaged in relevant projects. At the same time, we highly respect the confidentiality of our colleagues and guarantee that no personal data of colleagues, except those necessary for work activities, will leave the company. You can fully trust both our managers and the clients you work with.
  2. All work-related data handled by company employees is confidential, and all personal data of the employees themselves is private and is not to be disclosed to third parties, except in the case of intra-corporate interactions within the scope of the contract or special legal proceedings. Managers and clients, on their part, are also obligated to adhere to this directive.
  3. Adhere to digital security. When using the internet from a work computer, ensure the safety of corporate data you are working with, whether they are on your computer or directly accessible online through various accounts.
  4. We guarantee transparency in using artificial intelligence technologies in carrying out work tasks. The client must be informed that, in performing tasks such as content generation, coding, or management, we employ AI assistance.
  5. Whenever we collect others’ data, record audio/video materials with colleagues or clients, we always seek the person’s permission. Anything otherwise goes against the values of our company and our clients.

C. Organization of Working Time: General Provisions

  1. We provide a flexible work schedule, allowing the choice of working location (office or remote) and working hours from 10:00 to 19:00, with a one-hour lunch break. Short breaks for rest during working hours are allowed, and a slight adjustment to the boundaries of the working day is permissible.
  2. Communication among colleagues is welcome, but during working hours, focus should be solely on work-related topics, ensuring that a colleague can allocate time to you either immediately or later. By agreement, work-related issues can be discussed until 8 p.m., while other matters are better addressed before 10 a.m. The exception is high urgency, emergency situations, acute health deterioration, etc. Work-related issues are not discussed on weekends (except for compensatory time off or part-time work).
  3. Before taking leave, it is necessary to inform the department head and HR at least 2 weeks in advance, and before resignation, one month in advance. In this case, relevant applications (in 2 copies) should be prepared and signed by the department head or director after submission. Application templates can be obtained from HR.
  4. It is better to submit an application for sick leave than to jeopardize the project and the client with slow and poor-quality work.
  5. Before taking leave, it is necessary to notify the department head and HR in advance, and on the nearest working day, compensate for the time off.
  6. For us, the balance between work and life matters. We do not force our colleagues to live for work, spending more time on it than the regulated hours or tackling unmanageable tasks. We do not obstruct their desire to take a vacation or sick leave. Regular extracurricular events are held to help employees feel the company’s care, relax, enjoy good vibes, and interact with colleagues. We support employees’ desire to appreciate the results of their work at Cyber Whale in both work and non-working hours.
  7. When sending any application to the department head, also notify HR and PM, including placing them in copy when sending an email or message via messenger.
  8. For the most effective project coordination, if you live in the city where the company’s headquarters is located, it is recommended to work at the company’s office regularly, at least once a week. In other cases, rely on the goal-setting of the department head.

D. Organization of Working Time: Daily Provisions

  1. It is important to value each other’s time. Approach colleagues if you are sure that the information you provide will be informative, acceptable, unintrusive, and timely. Strive to structure thoughts clearly and concisely.
  2. Respect for time is one of the reasons why we actively use information search in browsers and with the help of AI. Practice shows that this is an effective strategy that significantly reduces micromanagement, saves managers’ and tech leads’ time, and positively influences colleagues’ ability to ask the right questions and efficiently find the necessary information. It is better to approach the tech lead or manager with well-clarified information and ensure that there are no remaining questions. These questions, compiled in a list, are discussed in subsequent calls or video conferences, after which colleagues return to improving previous tasks or completing new ones.
  3. Project management primarily relies on voice and video communication with the project group or individual colleagues. This allows for clearer conveyance of all project and task nuances, more precise regulation of work, improved coordination, and better time management, eliminating downtime due to lengthy and disorganized text-based discussions.

PickOnePic

Privacy Policy

built the PickOnePic app as a Free app. This SERVICE is provided by at no cost and is intended for use as is.

This page is used to inform visitors regarding my policies with the collection, use, and disclosure of Personal Information if anyone decided to use my Service.

If you choose to use my Service, then you agree to the collection and use of information in relation to this policy. The Personal Information that I collect is used for providing and improving the Service. I will not use or share your information with anyone except as described in this Privacy Policy.

The terms used in this Privacy Policy have the same meanings as in our Terms and Conditions, which is accessible at PickOnePic unless otherwise defined in this Privacy Policy.

Information Collection and Use

For a better experience, while using our Service, I may require you to provide us with certain personally identifiable information. The information that I request will be retained on your device and is not collected by me in any way.

The app does use third party services that may collect information used to identify you.

Link to privacy policy of third party service providers used by the app

Log Data

I want to inform you that whenever you use my Service, in a case of an error in the app I collect data and information (through third party products) on your phone called Log Data. This Log Data may include information such as your device Internet Protocol (“IP”) address, device name, operating system version, the configuration of the app when utilizing my Service, the time and date of your use of the Service, and other statistics.

Cookies

Cookies are files with a small amount of data that are commonly used as anonymous unique identifiers. These are sent to your browser from the websites that you visit and are stored on your device’s internal memory.

This Service does not use these “cookies” explicitly. However, the app may use third party code and libraries that use “cookies” to collect information and improve their services. You have the option to either accept or refuse these cookies and know when a cookie is being sent to your device. If you choose to refuse our cookies, you may not be able to use some portions of this Service.

Service Providers

I may employ third-party companies and individuals due to the following reasons:

  • To facilitate our Service;
  • To provide the Service on our behalf;
  • To perform Service-related services; or
  • To assist us in analyzing how our Service is used.

I want to inform users of this Service that these third parties have access to your Personal Information. The reason is to perform the tasks assigned to them on our behalf. However, they are obligated not to disclose or use the information for any other purpose.

Security

I value your trust in providing us your Personal Information, thus we are striving to use commercially acceptable means of protecting it. But remember that no method of transmission over the internet, or method of electronic storage is 100% secure and reliable, and I cannot guarantee its absolute security.

Links to Other Sites

This Service may contain links to other sites. If you click on a third-party link, you will be directed to that site. Note that these external sites are not operated by me. Therefore, I strongly advise you to review the Privacy Policy of these websites. I have no control over and assume no responsibility for the content, privacy policies, or practices of any third-party sites or services.

Children’s Privacy

These Services do not address anyone under the age of 13. I do not knowingly collect personally identifiable information from children under 13. In the case I discover that a child under 13 has provided me with personal information, I immediately delete this from our servers. If you are a parent or guardian and you are aware that your child has provided us with personal information, please contact me so that I will be able to do necessary actions.

Changes to This Privacy Policy

I may update our Privacy Policy from time to time. Thus, you are advised to review this page periodically for any changes. I will notify you of any changes by posting the new Privacy Policy on this page. These changes are effective immediately after they are posted on this page.

Contact Us

If you have any questions or suggestions about my Privacy Policy, do not hesitate to contact me at [email protected].

Cyber Whale hits IT Park in Republic of Moldova

SaaS
Proud to announce that Cyber Whale LLC has been incorporated in Republic of Moldova and entered the IT Park.

The benefits of being present in the IT Park are the following:

  • The unified tax rate which is just 7% of the sales (VAT not included).
  • A straightforward procedure to become a member of the park.
  • Easier reporting (just 1 monthly tax report , instead of 4 reports).
  • A great opportunity for investors.
  • 0% salary tax for employees, 0% medical tax, 0% social tax – all included in 7% tax rate.

Cyber Whale is a digital agency rendering Digital and Creative services as well as Machine Learning and Business Intelligence services, operating worldwide from Republic of Moldova.

How to write trained Word2Vec model to CSV with DeepLearning4j

I used DeepLearning4j to train word2vec model. Then I had to save the dictionary to CSV so I can run some clustering algorithms on it.

Sounded like a simple task, but it took a while, and here is the code to do this:

 

   private void writeIndexToCsv(String csvFileName, Word2Vec model) {

        CSVWriter writer = null;
        try {
            writer = new CSVWriter(new FileWriter(csvFileName));
        } catch (IOException e) {
            e.printStackTrace();
        }

        VocabCache<VocabWord> vocCache =  model.vocab();
        Collection<VocabWord> wrds = vocCache.vocabWords();

        for(VocabWord w : wrds) {
            String s = w.getWord();
            System.out.println("Looking into the word:");
            System.out.println(s);
            StringBuilder sb = new StringBuilder();
            sb.append(s).append(",");
            double[] wordVector = model.getWordVector(s);
            for(int i = 0; i < wordVector.length; i++) {
                sb.append(wordVector[i]).append(",");
            }

            writer.writeNext(sb.toString().split(","), false);
        }

        try {
            writer.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

Xanda BI Toolkit: clustering

In the previous post we introduced the toolkit release to open source and the general idea behind the project, now I would like to share clustering implementation.

At this point we implemented 3 clustering algorithms:

  • K-means
  • DBSCAN
  • Hierarchical clustering

K-means

Very straight-forward algorithm

#clustering algorithms
class KMeansAlgorithm(Step):
    def __init__(self):
        self.params = settings["clustering_settings"]["kmeans_params"]
        self.newColumn = settings["clustering_settings"]["target_column"]

    def execute(self, df):
        pprint(self.__class__.__name__)
        pprint(inspect.stack()[0][3])

        km = KMeans(**self.params)
        km.fit(df)
        clusters = km.labels_.tolist()
        df[self.newColumn] = clusters
        pprint(df.head(settings["rows_to_debug"]))
        return df

K-means is memory-friendly and provides good output resulrs.

DBSCAN

Although DBSCAN is noise reduction based algorithm it is capable to self-organise clusters.

class DBScanAlgorithm(Step):
    def __init__(self):
        self.params = settings["clustering_settings"]["dbscan_params"]
        self.newColumn = settings["clustering_settings"]["target_column"]

    def execute(self, df):
        pprint(self.__class__.__name__)
        pprint(inspect.stack()[0][3])


        loc_df = StandardScaler().fit_transform(df)
        db = DBSCAN(**self.params).fit(loc_df)
        core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
        core_samples_mask[db.core_sample_indices_] = True
        clusters = db.labels_.tolist()
        print(clusters)


        loc_df[self.newColumn] = clusters
        pprint(df.head(settings["rows_to_debug"]))

        return loc_df

Xandra BI Toolkit powered by ML released to Open Source

We are happy to announce that will be partially releasing our Python Business Intelligence Toolkit powered by machine learning algorithms to open-source.

Idea

The idea behind the toolkit is to provide an easy way for companies to arrange, process, visualise business data. Due to machine learning algorithms applied, users will be able so solve prediction, classification and clustering problems.

The visual part will also be a priority for us so the users are capable of conducting quick review.

Development

The development is done in Python using pandas, seaborn and, of course sk-learn libraries.  Since the product will bear a graceful name, we will be putting our best effort create modular architecture, lightweight code-style and test coverage.

Fine-tuning parameters will also be made easily using settings file.

{
"dataset_path" : "trained_all.csv",
"dataset_separator" : ";",
"columns_to_remove": ["Unnamed: 0", "Autoclass", "Color 1", "Color 2", "Image", "Images", "Description", "Overview" ],
"columns_to_encode":["Category"],
"columns_to_do_tfidf":["Product name"],
"should_purify" : true,
"problem" : "clustering",
"clustering_settings": {
  "algorithm" : "kmeans",
  "number_of_cluster" : 30,
  "target_column" : "Cluster"

},

"rows_to_debug": 5
}

The following design patterns will be used:

  • Pipeline / Chain of responsibility – in order to build pipeline of execution.
  • Abstract factory – to dynamically generate objects responsible for the picked algorithms
  • Decorator – to provide additional functionality to existing classes
  • MVC – to serve as architectural pattern for web applications later on

Roadmap

At this point data preprocessing is implemented: label encoding, tf-idf textual fields transformations, excessive columns removal.

The steps to follow are:

  • To implement clustering algorithms
  • To implement classification algorithms
  • To implement regression algorithms
  • To add visualization
  • To add support of different datasources (.txt, SQL etc)
  • To wrap inside web application

Please follow out Github repo or contact us at [email protected]

 

 

 

5 programming languages to fall in love with on St. Valentine’s Day.

Saint Valentine’s Day is a holiday of love not only toward your beloved one or family, but also to things like… programming languages. We would like to outline 5 programming languages to fall in love with on St.  Valentine’s Day.

Python

The list of reasons to love Python is infinite:

  • Prevents you from writing Spaghetti code by not compiling without proper indents.
  • Very easy to get started.
  • Multiple tutorials and mobile apps to learn Python on the run.
  • Great web frameworks like Django.
  • List of powerful packages. Just anything from csv to machine learning packages.
  • Easy to install, don’t need IDE.

Scala

Scala is not new and is growing and deemed as a future replacement to Java

  • Unlike Java has a lightweight syntax
  • Is 100% JVM compatible, so you can reuse existing modules.
  • Has great web framework called Play.
  • Implements functional programming paradigm.
  • Syntaxis sugar.

Angular 2

  • Best JS framework, great support, huge community
  • A lot of technologies relying on it. (i.e. Ionic 2).
  • Great data binding.
  • Improved version of Angular 1, with a better approach (not backward compatible).

C#

Old but good language that still dominates the charts.

  • Extremely popular with tons of examples and huge community.
  • Soon to be 100% cross-platform via .NET Core.
  • Excellent business-oriented web framework ASP.NET.
  • Great ORM frameworks, test frameworks.
  • Quite backward compatible, you will not drown with legacy code.

Kotlin

  • Very fresh and lightweight.
  • 100% JVM compatible
  • Out of box in IntelliJ IDEA because…
  • Kotlin created by developers at JetBrains and that these folks know to how to master a language. Just imagine, for so many years they studied thoroughly languages like Java, Groovy, Scala, and they surely have tons of “inspiration” to come up with a good programming language.

Let us program for you in any of this language, let us know at [email protected]

Happy St. Valentine’s Day!

 

How to parse dynamic HTML content using Python

In the previous tutorial we learning how to parse HTML in Python. In the Python tutorial we are going to learn to to parse dynamic HTML content generated by JavaScript, jQuery, Ajax, Angular or other dynamic pages technology.

What’s the problem with parsing dynamic HTML content in Python and in general?

The problem is that when you request contents of a HTML page, you are presented HTML, CSS and scripts returned from the server. If the page is dynamic, what you get is only a couple of scripts that are meant to be interpreted by your browser that, in its turn, will eventually display HTML content for a user.

That leads us to the idea that we should first render the page and then grab its HTML. Also it should take some time to render the page since sometimes the content is quite “heavy” and it takes some time to load it.

So, along with pure Python we should use some kind of UI component and in particular a Web View or some kind of Web frame.

One of the options is to use Qt for Python and to handle page rendering events and another one (which I honestly prefer more) is to use selenium for python.

So, let’s get down to writing some code but before that let’s elaborate and approach.

  1. Open web view with URL.
  2. Wait untill the page is loaded. Often the criteria here is a loaded div of some class.
  3. Grab the rendered HTML.
  4. Process it further using beautiful soup

You will need Chrome Web Driver to run the web view.

Also you will have to install selenium as well as libs from previous tutorial:

pip install selenium

So here is the Python code to parse dynamic content:

#import selenium compnents, urllib, beautiful soup
from bs4 import BeautifulSoup
from selenium import webdriver
from urllib import urlopen
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By


#url - the url to fetch dynamic content from.
#delay - second for web view to wait
#block_name - id of the tag to be loaded as criteria for page loaded state.
def fetchHtmlForThePage(url, delay, block_name):
	#supply the local path of web driver.
	#in this example we use chrome driver
	browser = webdriver.Chrome('/Applications/chromedriver')
	#open the browser with the URL
	#a browser windows will appear for a little while
	browser.get(url)
	try:
	#check for presence of the element you're looking for
		element_present = EC.presence_of_element_located((By.ID, block_name))
		WebDriverWait(browser, delay).until(element_present)

	#unless found, catch the exception
	except TimeoutException:
		print "Loading took too much time!"	

	#grab the rendered HTML
	html = browser.page_source
	#close the browser
	browser.quit()
	#return html
	return html


#call the fetching function we created
html = fetchHtmlForThePage(url, 5, 're-Searchresult')
#grab HTML document
soup = BeautifulSoup(html)
#process it further as you wish.....
#.....
processFetchedUrls(soup, path)
	

So here how to parse dynamic HTML content generated with JavaScript with the of Python.

Visit us to get help with your Python challenge of let us know if can help you with your digital needs.